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Abstract 

Enterprise-scale systems such as those used for cloud computing require a scalable and 
highly available infrastructure. One crucial ingredient of such an infrastructure is the ability 
to replicate data coherently among a group of cooperating processes in the presence of process 
failures and group membership changes. The last few decades have seen prolific research into 
efficient protocols for such data replication. One family of such protocols are the virtually 
synchronous protocols. Virtually synchronous protocols achieve their efficiency by limiting 
their synchronicity guarantee to messages that bear a causal relationship to each other. Such 
protocols have found wide-ranging commercial uses over the years. One protocol in particular, 
the CBCAST protocol developed by Birman, Schiper and Stephenson in 1991 and used in their 
ISIS platform was particularly promising due to its unique no-wait properties, but has suffered 
from seemingly intractable race conditions. In this paper we describe a corrected version of 
this protocol and prove its formal properties. 
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1 Introduction 


Many modern computational tasks are performed by groups of processes that are physically dis¬ 
tributed and are prone to failure. Such computations require efficient ways to reliably multicast 
messages within the group in the presence of group membership changes. These requirements are 
becoming increasingly common in data center systems such as storage systems and server clusters. 
Meeting these requirements while maintaining good performance is challenging. There is a need to 
keep all the members of the group in a synchronized state, whatever that may mean, and there is a 
need to avoid split brains and subtle race conditions that can occur when the group is reconfigured. 

These requirements generally fall into three categories: 

Message synchrony For the group to work coherently, at least some messages must be delivered 
at different group members in the same order. 

Reconfiguration atomicity When the group membership changes, there must be some guarantee 
against split brains and all the members of the new group must reach some kind of agreement 
on a common initial state. 

Progress guarantee There must be some guarantee that messages get delivered to the whole 
group. 

Meeting such requirements can be costly. To reduce these costs different authors have proposed 
weaker requirements within these three categories that allow for more performant systems. The 
original papers describing the process group model 011] envisioned incremental group reconfigu¬ 
rations, possibly limited to one member addition or removal at a time. This makes reconfiguration 
more expensive. More recent treatments of the subject assume general reconfigurations that oc¬ 
cur in bulk. In [3] bulk reconfiguration is joined with a flexible framework of delivery guarantees 
that allows for the tailoring of these guarantees to the needs of specific applications. However this 
framework requires all message delivery to stop while the system is being reconfigured. The Rambo 
system m is designed more specifically for a distributed atomic memory, but it allows messages 
to flow while bulk reconfiguration is taking place, ft also enjoys weaker synchrony guarantees that 
are tailored to the specific requirements of atomic memory. 

One multicast paradigm that proved especially useful is Virtually Synchrony (see a)- Virtual 
Synchrony achieves high efficiency albeit with some reduced availability (see [7|) by serializing 
messages only when they may be causally linked. Messages that are not causally linked may 
be delivered in different orders to different members of the process group. Virtually synchronous 
multicast protocols have been used for a long time in many commercial and non-commercial systems, 
for example Horus (see 0) and Transis (see [T]). For a comprehensive specification of group 
multicast protocols and their properties, see 0. 

One truly exceptional proposal for a virtually synchronous protocol was described in 0. The 
CBCAST protocol that the authors describe in that paper not only promises virtual synchrony and 
reconfiguration atomicity without stopping message delivery, but also promises to work without 
any delivery guarantee - all message broadcasts are ship-and-pray. While this last promise is not 
explicitly elaborated on in the paper, it is what makes the CBCAST protocol exceptionally powerful. 
Unfortunately, CBCAST did not deliver on its promise. Its very complexity meant that race conditions 
could never be completely wrung out of it. The goal of this paper is to fix that protocol and provide 
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a rigorous proof of its properties. 

Following [5], we describe the CBCAST algorithm within a formal model that contains processes, 
channels, packets, and an opaque group membership service. Channels connect pairs of processes, 
enabling them to send point-to-point packets (we reserve the term ’’messages” to the entities that are 
multicast by CBCAST). The group membership service (CMS) is assumed to be a primary-component 
CMS (as opposed to a partitionable one, see [5].) CMS provides ordered notifications of group mem¬ 
bership changes. Each component of the model can fail independently. As a result, failure scenarios 
can get very complex. 

We provide a rigorous proof of two essential properties of the CBCAST protocol, the Causal Order 
Property and the Progress Property. The Causal Order Property says that messages are delivered 
at each process in an order that respects the causality relationships between messages. The Progress 
Property says that if two processes in the group never halt, then any message broadcast by one is 
delivered at the other, provided that only a finite number of processes join the process group. Both 
of these properties are proved to hold under any pattern of component failure in the cluster. 

The proof is divided into a number of parts, each of which is a separate investigation. The outline 
of the proof plan is as follows: 

• The first part deals with the formal model only and is independent of CBCAST. We analyze 
the failure scenarios in the model (stop failures only) using an axiomatic approach, and show 
that under reasonable assumptions on the behavior of the CMS all partial failure cases are 
equivalent to either the failure-free case or to a simultaneous failure of all the components. In 
essence the group behaves like a single fault domain. This frees us to carry out the rest of the 
proof under the assumption that no failures ever occur. A critical ingredient in the analysis 
is the fact that the formal act of removing a process from the group by the CMS can mask the 
stop-failure of a process. 

• In the second part we give a detailed description of the CBCAST protocol. The version described 
here is not the most general and certainly not the most efficient. Our goal here is to achieve 
maximum simplicity in order to facilitate a clear analysis. Some parts of the protocol that 
deal with the admission of processes into the group are new. Specifically the steps of state 
donation and co-donation. 

• The third part deals with processes that are admitted to the group by CMS (as opposed to 
processes that are in the group from the start). We prove the rather surprising fact that a 
process admission event can be reduced to a process removal event. To do that we construct 
an explicit mapping from a history that contains an admission of a process G to a history 
that contains a removal of an ’’opposite” process -G. We show that the two histories carry the 
same computation and share the same progress and causality order properties. As a result 
we can get rid of any finite number of GMS admission events. 

• The fourth part is the proof of the Causal Order Property and the Progress Property in the 
special case where no new processes are ever admitted to the group beyond the original mem¬ 
bers. A central idea in the proof is the concept of effective routes. The CBCAST protocol allows 
a message to be transmitted from source to target multiple times and through multiple routes. 
However only one of the routes is effective and all the other are redundant. Concentrating on 
the effective routes simplifies the analysis a great deal. 
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Our model makes a number of assumptions on the behavior of the group membership service. There 
are many different implementations of such a service in the literature. Such a service is sometimes 
referred to as a reconfiguration service. See for example mm- Virtually all such services are based 
on the Paxos protocol (see mm)- Elsewhere we describe a detailed implementation and analysis 
of a compliant (primary-component) group membership service (see [2]). 


2 The Underlying Computational Model 

2.1 Introduction 

In this section we create a detailed, axiomatic computational model in which the CBCAST protocol 
executes. The purpose of this model is to create a context that is rich and precise enough for 
us to carry two of the core arguments in this paper. First, that the failure model of a group 
computation with a group membership service and stop faults is equivalent to the failure model 
of a single process, where the only failure is an instantaneous failure of the whole system; Second, 
that any group computation that is based on CBCAST where a process G joins in the middle of the 
computation is identical to a computation carried by a similar group that includes the process G 
from the start. The first argument allows us to carry out our analysis of the properties of CBCAST 
without taking failures into consideration. The second allows us to carry the analysis under the 
assumption that processes never join the group during the computation. 

The first argument is carried out in the current part. We use the interface points between different 
components in the model as a place to ’’shift the blame” from channel and CMS failures to process 
failures. Then we hide the process failures by subsuming them into the group membership service 
itself. To put this last shift in other words, if a process is officially removed by the group membership 
service and halts at the same time, then we can analyze the event as a pure group membership 
service event rather than as two separate events (a group membership service event and a component 
failure). To make this intuitive, process removal and process halting have to occur simultaneously. 
The notion of simultaneity requires some work since our model does not include a notion of time. 
We perform a careful analysis of the partial order that exists between events in the model to show 
that they can be laid along a timeline in such a way that desired events (such as process removal 
and process halting) occur at the same time. 

To make blame shifting possible, we model the channels as including a queuing stage at the sending 
and receiving ends of the channel - in other words we add a send queue and a receive queue to each 
channel. A channel that fails to ship a packet to its destination can shift the blame to the sending 
process by claiming that the faulty packet never left the send queue. We add a similar queue for 
GMS notifications. This queuing construct is not as artificial as it may sound. Queuing is inherent 
to communications networks. The sender/channel and channel/receiver boundaries are inherently 
blurry. 

The main technical difficulty in the analysis is keeping infinities from creeping in. If we try to shift 
the blame for an infinite number of failures to a single component of the model, we end up with 
the absurd conclusion that an infinite number of events happen in a finite period of time. It is for 
this reason that the analysis proceeds by dealing with each class of failures in one fell swoop rather 
than dealing with failures one at a time. 
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2.2 Model Overview 


We assume a group of processes - the group having a set of initial members to which at various 
times new processes can be added, while processes that are already in the group can be removed. 
We do not care how the decision to add or remove processes is made. We assume that there is an 
opaque group memhership service (GMS for short) that notifies all the current member processes of 
any addition or removal of a process. The notifications for each process are appended by GMS to a 
GMS receive queue. Eventually the process dequeues each notification in order and processes it. We 
do not assume that the membership service is reliable - notifications can stop arriving at an eligible 
process. 

We assume that the processes can communicate with each other by exchanging packets on point-to- 
point, unidirectional channels. We assume that the packets in each channel arrive in the order they 
were sent, i.e. the channel between the source and target is FIFO. The channels are not assumed 
to be reliable. When a process P wants to send a packet to a process Q, it appends the packet to 
the send queue of the outbound channel between P and Q. Eventually an opaque driver dequeues 
the packet and sends it through the channel. When Q receives a packet from P, the packet is 
appended to the receive queue of the inbound channel by an opaque driver. Eventually the process 
dequeues each packet and processes it. To make the model more symmetrical we assume that there 
is a channel, called the self-channel, between each process and itself. 

We assume that each process executes some code. This code is made up of a protocol PROTOCOL and 
an application APP. The application is an arbitrary execution thread that makes message multicast 
requests to communicate with other processes. Each message multicast request is appended to the 
APP receive queue of the process. Eventually the process dequeues each message and multicasts it. 

APP does not directly specify the set of recipient processes of each multicast. Doing so is impossible 
since the roster of processes changes over time and messages are multicast asynchronously rather 
than immediately. Instead, APP must specifiy a functional set of recipients, and leaves it to PROTOCOL 
to decide who are the members of the functional set at the moment that it performs the multicast. 

In general many functional sets can be specified for a multicast. For example if a natural order 
relation exists between all possible processes (e.g. an order on their identifiers) then the smallest 
member process, largest member process, or two smallest processes can be specihed as functional 
sets. If the processes have distinguishing characteristics (e.g. colors) then the blue processes or 
ultraviolet processes can be specified. 

In our analysis we do not assume any order or distinguishing characteristics. This leaves us with 
only one interesting functional set, namely the universal set of all member processes. A multicast 
to the universal set is called a broadcast. This is not really a limitation because functional sets can 
be specihed by APP in the body of each message. Once the message is received by a process, the 
receiving process can decide on its own whether it belongs to the specihed set. If it does not it can 
simply ignore the message. 

PROTOCOL is a set of non-blocking callbacks that the process uses to process the packets, notihcations 
and message broadcast requests that it dequeues from its various receive queues. We assume that 
the processing of each received item - packet, notihcation or message broadcast request - completes 
without any context switching, meaning that the items are processed one at a time; that items that 
arrive at the same queue are processed in the order at which they were queued; and that APP does 
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not progress while an item is processed. 

We assume that each process has a state. The state can be thought of as the totality of values of 
all the variables that are managed by PROTOCOL. We assume that APP can read some of this state, 
but cannot change any of it directly. This does not mean that APP has no private state of its own, 
however we do not model it. We will see that at least in the case of CBCAST, and plausibly in 
the general case, APP is started from scratch at each new process and therefore we can treat it as 
stateless for the purpose of our analysis. 

Since the state is only changed by PROTOCOL, the state of a process when it dequeues the next packet, 
notification or message broadcast request is identical to the state it had after it finished processing 
the previous packet, notification or message broadcast request. The initial group of processes all 
start their life with an identical initial state. Any process that is subsequently admitted into the 
group by CMS starts life with a state that is identical to the state of a parent which must be an 
existing group member. The parent is selected by CMS. To be more accurate, the new process starts 
life by dequeuing the CMS notification of its own joining. At that moment is has the same state that 
its parent has when it dequeues the same notification from CMS. This makes the act of dequeuing 
the notification similar to the execution of a fork() system call in UNIX. 

There are a few types of events in our model. Each event occurs at a specific process and can be 
either a queuing event, also called a side effect event, or a dequeuing event, also called a trigger 
event. 

A trigger event occurs when any item is dequeued from some receive queue. This includes packet, 
notification and message broadcast request dequeuing events. Such an event triggers processing, 
using the appropriate callback supplied by PROTOCOL. 

A side effect event occurs when the process multicasts information by appending one or more 
identical packets to the send queues of outbound channels, destined to different processes. Such an 
event always occurs as a result of the execution of a callback, and is therefore a side effect of that 
execution. In other words, the side effects of a trigger are determined by the current state of the 
process and by the implementation of PROTOCOL. APP cannot initiate a queuing event directly, but 
only through the issue of an message broadcast request. 

The various send queues of outbound channels; the receive queues of inbound channels; and the 
receive queues of notifications and message broadcast requests form an integral part of a process in 
our model. The inclusion of queuing as part of the model is in recognition of the fact that queuing 
is a fundamental property of any communication mechanism and is not an artifact of any particular 
implementation. The presence of queuing blurs the boundary between a processes and channels 
and between processes and CMS. This insight is crucial in reasoning about failures. 

Some events have an actual or potential causal relations between them. This is captured by a partial 
order on events. The queuing event of a packet always precedes its dequeuing event. In addition, 
the events at a specific process are linearly ordered, capturing the assumption of a single threaded 
processing of trigger events. In fact, at every process the sequence of events can be broken into 
intervals composed of a single trigger followed by a finite sequence of the zero or more side effects 
that are caused by the processing of the trigger. We refer to such a sequence as a transaction. 

The structure of the transactions that compose the event set of each process is determined by 
PROTOCOL. Each trigger event causes a PROTOCOL-specific callback to be executed. The callback can 



generate an arbitrary number of queuing events. 

While the number of events may be infinite over the life of a process, there is only a finite number 
of events preceding each particular event. 


2.3 The Formal Model 

2.3.1 Ingredients of the model 


P 

The set of all processes. 

Ph 

The set of all halting processes. C P. 

23 

The number of views. 0 < 03 < oo 
where i < 23 

The set of members of the view. is a finite set of processes. We will freely refer to both 
and its index i as a ’’view”. 

K 

The set of all packets. A packet k has a content which in denoted by cont(A:) and is protocol 
and application specific. A packet is sent from a source process and received by a target 
process. Due to faults, a packet may fail to be sent or received. So in general a packet is 
either sent and received, or sent and not received, or not sent. In the second case, where the 
packet is sent but not received, we say that the packet is dropped. 

By abuse of notation we will say k = c when we mean cont(fc) = c. 

F 

The set of all QMS notifications. For each view i and each process P there is at most one 
notification for view change i that is supposed to be received at P. We denote this notification 
by Vi{P). A notification has a content which is made up of the view change type (join or 
removal), the identity of the process that is joining or is being removed, and possibly the 
identity of the parent process (in case of a join). We denote the content of the notification by 
cont(z;i(P)). The possible contents for a notification are: 

nREM(pid) 

Process pid is removed. 

HjOIN (pid, p_pid) 

Process pid is joining, and its parent is the existing member p_pid. 

^^START (pid) 

You (the recipient of the notification) are the new member of the group, and your 
identifier is pid. 

Ustop 

You (the recipient of the notification) are no longer a member of the group. 
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Due to faults, a notification may fail to be received by its target process. In such a case we 
say that the notification is dropped. 

By abuse of notation we will say Vi{P) = c when we mean cont(r;i(P)) = c. 

A 

The set of all message broadcast requests from APP. An message broadcast request r has 
a content which is denoted by cont(r) and is application specific. Since these requests are 
generated locally we assume that they are never dropped. 

The unidirectional channel from a source process P to a target process Q. A channel is a set 
of packets. P(^ C K. 

E,^ 

The set of all events, partially ordered by the ^ order relation. If e ^ / we say that event e 
precedes event /, or that e is earlier than / and / is later than e. 

The ^ relationship is a weak partial order, meaning that in some cases both e < f and / ^ e 
can hold at the same time for e /. We will in indicate such cases by e x / and say that e 
and / are contemporaneous. 

Ep 

The set of all events that occur at process P. If this set is empty we say that P is uninitialized. 
Ap 

The set of all requests that APP issues at process P. 

GMS 

A Group Membership Service that delivers view change notifications to the processes. 

PROTOCOL 

A protocol that determines how triggers are processed and what side effects they create. 

APP 

A user application that generates message broadcast requests at various processes. 


Events in the model 

kQV 

The packet k & P(^ is appended to the send queue of the channel. This event occurs at P. 
k^^ 

The packet k G PQ is removed from the receive queue of the channel and processed. This 
event occurs at Q. 

v,{pr^ 

The notification Vi{P) is removed from the receive queue at process P and processed. This 
event occurs at P. 
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The message broadcast request r G Ap is removed from the APP receive queue and processed. 
This event occurs at P. 

The set Ep includes all the packets queued by P, all the packets dequeued by P, all the view 
notifications dequeued by P, and all the message broadcast requests dequeued at at P: 

Ep = e E \3Q{k e ^)} IJ € E \3Q{k e ^)} 

y MPf^ e E} U e E |r € Ap} 


2.3.2 The PROTOCOL interface 

We mentioned in the introduction that each notihcation; dequeued packet; and message broadcast 
request has to be processed somehow. The processing is mostly controlled by PROTOCOL which 
consists of a small number of non-blocking calls that we will describe below. 

When APP wishes to broadcast a message m it invokes the PROTOCOL call protBroadcast(m). 

When a brand new process group Grp is initialized, the PROTOCOL call protStart( Grp, P) must be 
invoked manually at each member process P S Grp. 

When a notification n is dequeued at a process P from the notification queue, some pre-processing 
has to occur before PROTOCOL can take over. First of all removal notihcations have to be separated 
from join notifications. 

If the notihcation is the removal notihcation of a process R then the PROTOCOL call protRemove(i?) 
is invoked. 

If the notihcation is a join notihcation of a process J with parent E, the process P has to determine 
whether it is the designated parent of the joining process. If P is not the designated parent {P ^ E) 
then it invokes the PROTOCOL call protJoin(J, E). If P is the parent then it must hrst fork the 
new process and call protRun(J) in the child process to initialize J. This is summarized in the 
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pseudo-code below. 


Procedure doNotification(n) 

Input: n is the notification being processed 

if n = Urem {R) then 
protRemove(i?); 

end 

else if f? = njom{J, E) and E ^ se/f then 
protJoin(J, E)] 

end 

else 

// the local process is the parent 
switch fork() do 
case parent; 

// this block is executed when forkO returns in the parent 
protJoin(J, E); 
endsw 
case child; 

// this block is executed when forkO returns in the child 
protRun( J); 
endsw 
endsw 
end 


When a received packet k is dequeued at a target process T from the receive queue of an incoming 
channel the process invokes the PROTOCOL call protPacket(fc, S). 

All in all the following six non-blocking calls must be implemented by PROTOCOL: 

protBroadcast(m) 

This call is issued by APP when it wishes to broadcast a message m. 

protStart(roster, P) 

This call is issued manually at each initial member process when the group is initialized at 
the start of view zero, roster is the set of initial members and P G roster is the identifier of the 
process at which the call is issued. While CMS does not issue join notifications to the initial 
members at view zero, this call appears as if it were issued in response to such a notification. 

protRun(P) 

This call is issued at a new process right after it is forked from its parent. P is the identifier 
of the new process. At the moment of forking the new process is identical to its parent and 
this call is the means by which the new process acquires an independent identity. While CMS 
does not issue a join notification to the joining member, this call appears as if it were issued 
in response to such a notification. 

protRemove(P) 

This call is issued in response to a removal notification from QMS. P is the identifier of the 
removed process. 
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protJoin(P, E) 

This call is issued in response to a join notification from GMS. P is the identifier of the joining 
process and E is the identifier of its parent. 

protPacket(k, S) 

This call is issued when a packet is dequeued from a receive queue, k is the received packet 
and S is the process identifier of the sender of the packet. 


2.3.3 The APP interface 

The user application APP runs in its own thread at each process. Its only means for communicating 
with other processes is the |protBroadcast| call. With the help of PROTOCOL, APP can have view 
change notifications and messages from other processes delivered to it. In order to facilitate these 
deliveries, APP must implement a small number of non-blocking callbacks that are executed by 
PROTOCOL within the context of PROTOCOL calls. As a result these callback execute outside the 
context of the main APP-thread. 

We assume that these callbacks are used by APP to manage an opaque (APP-dependent) data struc¬ 
ture ReplicatedData. This data structure can be manipulated by the callbacks, but the main APP 
thread can only read that structure and not change it. In order to allow ReplicatedData to be 
initialized APP must provide an initialization callback. 

The callbacks are not able to invoke the [protBroadca^ call. Only the main thread can do that. 
Specifically, APP must obey the following rules: 

1. APP must implement the following callback functions: 

• GroundState(): This call creates the initial value of ReplicatedData. It is called when 
a member of view zero is initialized, thus guaranteeing that the replicated data will 
start with coherent values at all the initial processes (the meaning of ’’coherent” here is 
APP-specific. It can simply mean ’’identical”, but it can also indicate a more complex 
relationship.) 

• ApplyMessage(msg, originator): This call applies a message msg from process originator 
to ReplicatedData. One possible way to apply the message is to append it to a delivery 
log (see ED- 

• ApplyJoin(pid): This call applies the notihcation that a process with identity pid has 
joined the group. In [3] such notifications are appended to the delivery log. 

• ApplyRemoval(pid): This call applies the notification that the process with identity pid 
has been removed from the group. In [3] such notifications are appended to the delivery 
log. 

2. APP must implement a Main(pid) function. This is the main application thread. Its only 
parameter is the local process identity. This implies that when APP is started at a new 
process, it has no context except for the local process identity and the information that is 
available to it through ReplicatedData. 
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3. The Main() function of APP may invoke the [protBroadcas^ procedure but the callbacks may 
not. 

4. The Main() function has read-only access to ReplicatedData. It may manage additional data 
outside of ReplicatedData without any restrictions. 

5. The callbacks listed above have read and write access to ReplicatedData. They have access to 
no other information. In particular they do not know the identity of the local process. 

6. The Main() function runs in its own thread. The callbacks are called in the context of a 
critical section, and therefore must not block. 

2.3.4 Model axioms and histories 
View Interval Axiom 

A process P is a member of at least one view, and the set of views of which it is a member is an 
unbroken interval, called the view interval, which is either finite or infinite. Formally 

{i\P e Si} = {i\j{P) < i < r(P)} 

> 0 then we say that j{P) is the join view of P. If j{P) = 0 then we say that P is original. 

If r(P) < 23 then we say that r(P) is the removal view of P. If r{P) = 23 then we say that P is 
not removed. 


View Change Axiom 

View zero, which contains the initial group members, is finite. Every subsequent view differs from 
its predecessor by the addition or removal of a single process. As a result, each view other than 
view zero is the join view or the removal view of exactly one process. 


Channel Axiom 

Every packet belongs to exactly one channel. If A: G PQ we say that P is the source of k and Q is 
the target of k. 


Packet Event Ax iom 

Every packet k € PQ has a single queuing event € Ep and at most one dequeuing event 
k’"^ G Eq . If the packet has a dequeuing event then 

k’^'^ -< k^^ 

in other words k is queued (at the sending process) before it is dequeued (at the receiving process). 
If ki ^ k 2 then ki^^ ^ k 2 ^^ and ki^^ ^ k 2 ‘^^, wherever these events exist. 

If e = then the set of packets Me = {k' \ k'‘^^ = ej is finite. Me is called the multicast set of e. 
All the packets in Me have identical contents and share the same source P, but they must all have 
different targets. In other words, no two of them belong to the same channel. The set of target 
processes Tg = {Q \\3k' € Me{k' G P(^)} is called the target set of e. 
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Packet Order Axiom 

Channels are FIFO. Precisely, if 

• packets k and k' belong to the same channel 

• k'^^ exists 

then k^^ exists and k^^ -< k'^^. 


GMS Axiom 

The following GMS notifications exist in the model 

• For each process P and each j{P) < i < r{P) there is exactly one notification Vi{P). 

— If i is the join view of process J (with parent E) then Vi{P) = 

— If i is the removal view of process R then Vi{P) = n^^yi{R). 

• For each process P with j{P) > 0 there is exactly one notification ^3iP)iP) =n START (P) 

• For each process P with r{P) < 23 there is exactly one notification Vr(p){P) = Hstop 
W e also add the following artificial notifications that do not relate to actual GMS notifications: 

• For each original process P we add a notification vo{P) — nsTART(P) 

• For each process that halts and is not removed we add a notification u® (P) = Hstop 

Notification Event Axiom 

If Vi{P) exists, there is at most one Vi{P)^^ event. If z = 0 and P is original then Vi{P)^^ exists. 

If Vj(^p'){P)^^ exists, we will use the shorthand Prun = 'Vj{P){Py^ 

If Vr{p){P)^^ exists, we will use the shorthand Phlt = Vr{p)iP)^^ 

Notification Order Axiom 

View notifications are dequeued in order. Precisely, if j{P) < i < i' < r{P) and Vir{Py^ exists 
then 

1. Vi{P)^^ exists. 

2 . Vi{P)^^ -< Vi>{PY^ 


Parent Axiom 

If J joins in view j(J) > 0 and E is its parent and Jrun exists then Vj(^j){EY^ also exists and 
Vj(^jYEY^ X Jrun- In other words a new process would not instantiate unless its parent has 
processed the notification that announces its joining but we model the two events as being contem¬ 
poraneous. 
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Process Order Axiom 

The precedence order ^ is a linear order at each process P. In other words any two events in Ep 
are ^-comparable. 


Process Liveness Axiom 

A process P does not queue a packet to send to another process Q unless Q appears live to P. In 
other words, if fc S P(^ then the following two conditions must be met: 

1. Q e Sj(p) or, Vjt^Q){P)^^ G Ep and Vj(^Q){P)^^ -< 

2. If Vr(Q)iP)^'^ G Ep then ^ '^rCQ) 

Take note that this axiom is a statement about the behavior of PROTOCOL, since its callbacks are 
the only elements of the model that generate queuing events. 


Piggyback Axiom 

Packets are processed in the same or higher view than the one at which they are queuecj^- In other 
words, if /c G P^ and exists, then for any i, if * < j{P) or Vi{Py^ -< then i < j{Q) or 
exists and Vi{Q)^^ -< k^^. 


Self Channel Axiom 

Packets on self channels are processed early. If P is a process, k G PP is a packet and i is a view 
such that Vi{PY^ exists and k^'^ -< Vi{P)^^ then k^^ exists and k^^ -< Vi{P)^^. 


Request Event Axiom 

If request r G Ap exists, there is at most one event. 


Order Foundation Axiom 

The ^ order in Ep is very well founded, meaning that every event is preceded by only a finite 
number of earlier events. Formally, for any event e G Ep: 

\{f G Ep 1/ ^ e}| < oo 


If Ep ^0 then Prun exists and is the first element of Ep. 
If Prlt exists then is the last element of Ep. 


Minimal Order Axiom 

The order -< is the minimal order generated by the order relations at each process and by the orders 
stipulated by the Packet Event Axiom and the Parent Axiom. 

^For this axiom to hold the implementer has to ’’piggyback” the latest view change information on every packet 
that is sent, in case this information is missing at the receiving side 
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First Halting Axiom 

A halting process P has a finite event set. In other words, if P € Pft, then | Ep | < oo. 


Second Halting Axiom 

Let P be a process and let f be a finite integer in the interval j(P) < i < r{P). Then 

• If Vi{P) is dropped then Vi{P)^^ does not exist. 

• If P does not halt and Vj{P) is not dropped for any j < i then Vi{P)^^ exists. In other words 
if P does not halt then it dequeues all the notihcations that it is legally allowed to process. 


Third Halting Axiom 

Let P and Q be processes and let k € P^ be a packet. Then 

• If P does not halt then k is sent. In other words, a non-halting process eventually sends all 
the packets that it queues to its send queues. 

• If A: is not received then k^^ does not exist. 

• If 

— P and Q do not halt 

— Every packet k' € Pt^ where k'^^ ^ k'^^ is received 

then k^^ exists. In other words in the absence of a gap or stoppage, a packet in a channel is 
eventually dequeued and processed. 


Fourth Halting Axiom 

Let P be a process and let r G Ap be an message broadcast request. If P does not halt then 
exists. In other words a non-halting process eventually dequeues and processes all of its message 
broadcast requests. 

Definition 1. A particular vector of values (P, P^, K, F, A, E, that satisfies the model axioms is 
called a history. 


2.4 Event Order and Time 

The group computation model that we presented in the previous section does not include a notion 
of time. But it does include a notion of causality that is embodied by the partial order on events. 
Furthermore, any instance of group computation, embodied by the notion of a history, does unfold 
in physical time. How is physical time related to event order? intuitively, effects always follow 
causes in time. Also, within any finite interval of time only a finite number of events can occur. 
Beyond that we can say nothing. In other words, given any physical group computation in our 
model there should be a timestamp mapping 

time : E”^ R 
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Where H is the history of the computation, R is physical time as represented by the real numbers, 
and time(e) is the physical time at which the event e occurs. The mapping time(-) must have the 
following properties: 


1. e X / time(e) = time(/) 

2. e f time(e) < time(/) 



for all r G R 


Conversely, any arbitrary timestamp mapping time(') that meets the above criterion should be 
realizable by a physical computation that yields the history H. One simply has to assume that the 
the computation unfolds at each process at the speed that is dictated by time(-). 

With that in mind, we want to show that every history can be realized by a physical computation 
that unfolds in physical time in a particularly convenient fashion. Namely we want to show that 
for every history there is a timestamp function that enjoys the additional property 

4. time(ui(P)’’^) =i whenever Vi{P)^^ exists 

In other words, all the notification events for view i occur at exactly the same time. By constructing 
such a realization we will demonstrate that race conditions where different processes have different 
ideas about group membership can be ignored and the local view change notification events at the 
various processes can be collapsed into a single event. This is exactly what we intend to do in 
section [SJ 

For this kind of timing to be possible we must show that the partial order ^ is stratified by view. 
In other words we must show that every event e can be assigned a value view(e) such that 

• e ^ / only if view(e) < view(/). 

• view(z;i(P)'’’^) = i. 

It turns out that this can be done for any history H. 

2.4.1 Kbnig’s Lemma 

A key tool in this and subsequent investigations of the event order relation in H is Konig’s Lemma, 
a well known property of some partially ordered sets. We use the following formulation of the 
lemma: 

Lemma (Konig’s Lemma). Let A be an infinite partially ordered set with a first element oq where 
the following properties hold for any element a G A: 

• a has a finite number of immediate successors. 

• If b > a then there is an immediate successor b^ of a such that b > bo. 

Then A contains an infinite ascending branch. 

Proof. Call an element a G A heavy if it has an infinite number of successors. Then obviously oq is 
a heavy element. We will show that every heavy element has a heavy successor. Once we do that 


18 


we can choose a heavy successor ai to oq, a heavy successor 02 to oi, etc. until we get an infinite 
ascending branch oq < ai < 02 < .... 

Let a be a heavy element. By assumption, each successor 6 > a is mediated by an immediate 
successor of a. Since there are an inhnite number of the former and only a finite number of the 
latter, there must be some immediate successor c of a that has an infinite number of its own 
successors. In other words, c is a heavy successor of a. □ 


2.4.2 Stratifying events by view 

We start our investigation by showing that the set of notification events of each view form a maximal 
semi-antichain in the partial order and that these semi-antichains partition the packet events by 
the view at which they occur, a notion that we will make precise. 

For each hnite view i in the view interval 0 < i < 53 define 

Gi = exists} 

The sets Gi form a partition of all the notification events in H. 

We now extend the partition to all the events. For all finite 0 < i < 93 Define 
Ki = {e & < e for some g £ Gi} 

Ki = Ki\ Kj for finite 0 < z < 93 

i<j<^ 


We will now investigate the order relations between events in the different Gi sets. We will show that 
each Gi is a semi-antichain, meaning that there are no strict ^-inequalities between its elements, 
and that elements in a high Gi never precede elements in a low Gi. 

Lemma 1. Suppose that there are processes P and Q and views i,j such that 


Then i < j. 


Proof. The Minimal Order Axiom implies that the relation Vi{P)^^ -< Vj{QY^ is derived from a 
sequence of parent/child relations x Jrun) and queuing/dequeuing relations ^ 

k^^). We will prove the claim by induction on the number of intermediate steps in the shortest 
derivation. 

If the derivation is immediate then the two events must occur at the same process, namely P = Q. 
In this case the Notification Event Axiom and the Notification Order Axiom imply that i < j and 
we are done. 

If the derivation is longer, look at the hrst step. This step can be of packet type or parent/child 
type. 

If the first step is of packet type then there is a process R and a packet k € ph such that k^^ 
exists, k^^ G Ep and 

v,{Pf^ < k^'^ -< k^^ -< 
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By the Piggyback Axiom this implies that i < j{R) or Vi{RY^ -< If Vi{R)^^ -< then we 
can create a shorter derivation leading from Vi{R)^^ to Vj{Q)^^ and conclude by induction that 
i < j. 

If i < j{R) then we can create a shorter derivation leading from i?RUN = to Vj{Qy^ and 

conclude by induction that j{R) < j and therefore i < j. 

If the first step is of parent/child type then there is an initialized process J that is a child of P and 

v,{pr^ < vjij){pr^ X Jr™ ^ 
and either the rightmost or leftmost inequalities is strict. 

The left inequality implies, by the Notification Event Axiom and the Notification Order Axiom, 
that i < j{J). If the inequality is strict then i < j{J). 

The right inequality implies by induction that j{J) < j. If the inequality is strict then j{J) < j. 
Together these facts imply that i < j and we are done. □ 

Corollary 1. Gt G Ki 

Corollary 2. Every event e G E is o member of exactly one Ki. 

Proof. It follows directly from the definition that e cannot belong to more than one Ki. The difficult 
part is showing that e belongs to some Ki. The event e occurs at some process Q and therefore 
Qrun ^ e. Therefore e G Kji^qy If 2J < oo then there is a largest j such that e G Kj and therefore 
e G Kj and we are done. To finish the proof we have to show that even when 23 = oo there is such 
a largest j. 

By the Order Foundation Axiom, there is a largest j such that Vj{Qy^ ^ e. We will show that 
j is the maximal value we are looking for. Suppose there is a process P and a view i such that 
Vi{P)^^ ^ e. If the derivation of this relation is immediate then P = Q and by definition i < j. 

For a non-immediate relation we proceed by induction on the length of the shortest derivation. 

If the derivation starts with a packet type step then there is a process R and a packet k G such 
that 

v^{Pf^ -< ^k^^ ^e 

By the Piggyback Axiom the leftmost inequality implies that i < j{R) or Vi{R)^^ -< k^^. If 
Vi{R)^^ -< k^^ then we can create a shorter sequence leading from Vi{Ry^ to e and conclude by 
induction that i < j. 

If * < ji.R) then we can create a shorter sequence leading from Prun = {RY^ to e and conclude 
by induction that j(R) < j and therefore i < j. 

If the derivation starts with a parent/child type step then there is a child J of P such that 

ViiPY"^ ^ Uj(j)(P)’’^ X Jr™ a e 

The lefthand inequality occurs within Ep and therefore we know that i < j{J). Also, since Jrun = 
Vj(^j){JY^ we can conclude by induction that the righthand inequality implies that j(J) < j. 
Therefore i < j and we are done. □ 
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Definition 2. If an event e G E belongs to Ki, we say that the view of e is i and denote it by 
view(e) = i. It follows from the proof of Corollary\^that all the events in Ep immediately following 
a notification event share the same view i. 

Next we investigate the order between the Ki sets. 

Lemma 2. Let e and f be events such that e < f. Then view{e) < view{f ). 

Proof. By definition there is an event g G G'view(e) such that g ^ e. Therefore g ^ f and therefore 
/ G .^view(e)- It is easy to see from the definition of K., that this implies view(e) < view(/). □ 

Corollary 3. The partial event order -< is very well founded in E, meaning that for each event e, 
the set {/ G E 1/ ^ e} is finite. 

Proof. Let e be an event and let i be the view of e. Then for every preceding event f < e with view 
j we have j < * by Lemma [2] Therefore if a process P joins at a view that is higher than i then 
Ep does not contain any predecessor of e because for any / G Ep we have Prun ^ / and therefore 
view(/) > i. So all the events that precede e come from early joining processes, and it follows from 
the View Change Axiom that there is only a finite number of such processes. 

Suppose that the partially ordered set of predecessors of e (including e itself) is infinite. The 
Minimal Order Axiom implies that each event in the set has at most two immediate predecessors 
(one at the process and one at the channel. A join event of a child has one strict immediate 
predecessor at the parent process.) The same axiom, together with the Process Order Axiom and 
Order Foundation Axiom imply that every predecessor is mediated by an immediate predecessor. 
By inverting the direction of the order Konig’s Lemma implies that there is an infinite decreasing 
sequence 

e = Co Cl 62 P ■ ■ • 

All the predecessors of e reside on a finite number of early joining processes, and therefore there is 
one process X that contains an infinite number of the events in the regression chain ^j 2 >-■■■■ 
But this means that is an event in Ejc that has an infinite number of predecessors at the same 
process X. This contradicts the Order Foundation Axiom. □ 


2.4.3 Creating an event timeline 

We want to show that the properties of the event order in H are sufficient for the creation of a 
timeline that meets the basic timeline criteria m-m as well as the additional criterion (jH). 

The naive plan is to start by assigning time)^) = v for each event g G G„. This should work thanks 
to Lemma [T] and because the sets Gy are finite. Then it would be tempting to squeeze all the events 
in Ky \ Gy into the open time interval (n, n + 1) G M. 

This would have worked if we could guarantee the finiteness of Ky. But we cannot do that because 
of a number of permissible pathological situations. For example, a process can be removed and yet 
not halt. Such a process could generate an infinite number of events with a fixed view. Another 
example is a process that is not removed, but stops receiving GMS notifications at some point. As 
a result all the events at the process have a fixed view beyond a certain point. A reasonable group 
communication protocol would eliminate such pathologies (e.g. through various timeout clocks that 
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would force a process to halt when a pathology is suspected.) In Section [2.5.II we explore a set of 
such ’’reasonableness” assumptions. For now we explore the more general situation that requires 
more finesse in constructing the timeline. 


The idea is to force the events in Gy to occur contemporaneously while allowing some events in Ky 
to occur indefinitely late, bounding their timing by a measure of their pathology. We measure the 
pathology of an event e using the notion of a view bound: 

Definition 3. Let K = \ e € K be an event. The view bound of e is 


vb(e) 


min{?ii|3g(e ■< g € Gw)} if such views exist 
03 otherwise 


Definition 4. denotes the set of events in K of bound 6 < 03. denotes the unbounded 

events in K. 


It follows from the definition of view() and from Lemma [T] that vb(e) > view(e). 

It follows from Corollary [3] and from the fact that the sets Gi are finite that the set is finite for 
each 6 < 03. The set iL® may be infinite. 

We can now amend our timing plan by placing the events in the finite set on the time interval 
(6 — 1,6) and by carefully distributing the events in if® between different time intervals. To do 
this we need the following set-theoretic lemma: 

Lemma 3. Let [A, -<) be a countable, very well founded, partially ordered set. Then the elements 
of A can be listed in a sequence that respects the partial order In other words, the partial order 
-< can be extended to a total order of order type uj, the order type of the natural numbers. 

We will prove the lemma below. We use Lemma [3] to order the elements of if® into a sequence 

if® = {e^e^e^...} 

that respects the ^-order. 

Now we can place all the events of E on a timeline. 

Obviously, every event in g G Gy occurs at time v. As we mentioned before, the events of if^ occur 
in the time interval (6 — 1,6). Lemma [3] guarantees that the events in if^ can be arranged along 
that interval in a way that respects the ^-order. In order to reserve some free time to schedule if® 
events in the interval, we restrict the events in if^ to the smaller interval (6 — 1 , 6 — 1 / 2 ). 

Now we can place the events of if® along the timeline. To make matters simpler, we will place 
all of the events in this set at times of the form time(e®) =71^-1- 3/4 where Ui £ N. We do that 
inductively by defining: 

time(e*) = max { [time(/)J |/ ^ e*} -I- 7/4 ([xj stands for the integer part of x) 

This inductive formula is well defined because each e* has a finite number of predecessors (Corollary 
El and because every predecessor of e* in if® must come earlier in the sequence than e® and therefore 
has its time already defined. 

Our construction of a timeline for E clearly satisfies conditions o, m and O- We have to 
demonstrate that condition which stipulates that the timeline respects the -<-order, is also 
satisfied. Take any two events ei, 62 such that ei ^ 62 . 
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If 62 € iiT® then by construction time(ei) < time(e 2 ). Henceforth we will assume that ii 62 & K 
then vb(e 2 ) < 

If 61 S Gi and 62 S Gj then Lemma [T] implies that time( 6 i) = i < j = time( 62 ). 

If 61 £ Gi and 62 € K then Lemma [2] implies that view( 62 ) > i and therefore vb(e 2 ) > i. Since we 
can also assume vb(e 2 ) < it follows that vb(e 2 ) — 1 < time(e 2 ) < vb( 62 ). Therefore 

time(ei) = i < vb( 62 ) — I < time(e 2 ) 

If 61 £ LiT and 62 £ Gj then by definition vb( 6 i) < j < 23 and therefore 

time(ei) < vb(ei) < j = time( 62 ) 

We are left with the case where 61,62 £ K. The order 61 -< 62 implies vb(ei) < vb( 62 ) and we are 
assuming that vb( 62 ) < 23. If there is a strict inequality vb(ei) < vb(e 2 ) then 

time(ei) < vb(ei) < vb(e 2 ) — 1 < time(e 2 ) 

If there is equality then both ci and 62 belong to the same set their timing 

matches their order by construction. 

Proof of Lemma\^ The set A is countable. Let the bijection /i : N —>■ H be an arbitrary ordering 
of A into a ’’bad” sequence (one that does not necessarily extend the ^-order.) 

Let c» = Cl, 62 , 63 ,... be an arbitrary sequence of natural numbers where each number appears an 
infinite number of times (for example one could use the sequence 1,2,1,3, 2,1,4, 3, 2,1,...) 

To create the sequential extension of the ^-order. Start with an empty ’’good” sequence. We will 
append the elements of A to the good sequence one at a time using the following infinite process. 
At the step, look at the element of the bad sequence, namely the element /i(c„). Now do 
the following: 

1. If h(cn) has already been appended to the good sequence, do nothing. 

2. If some predecessor a -< h(cn) has not yet been appended to the good sequence, do nothing. 

3. If h(cn) has not yet been appended to the good sequence, but all of its predecessors in the 
-<-order had been appended, append h{cn) to the good sequence now. 

The good sequence that this procedure generates has some obvious good properties. It includes at 
most one copy of each element of A, and the sequence order respects the ^-order. We just have to 
show that every element of A is eventually appended to the good sequence. 

Assume instead that some element of A is never appended to the good sequence. Look at the subset 
Aq C A oi elements that are never appended. This subset is not empty by assumption. Since the 
^-order on A is very well founded, there must be a ^-minimal element in Aq. Suppose that this 
minimal element is qq = h{j). Since the ^-order is very well founded, the element qq has a finite 
number of ^-predecessors a'j^, a^,..., a^. 

By assumption all of these predecessors are appended to the good sequence, and this must happen 
by some finite step uq. By our assumption on the sequence c* there is some high number n > uq 
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such that c„ = j- At step n the element oq = h{j) = h{cn) will be inspected and it will be found 
that all of its predecessors have already been appended to the good sequence. As a result ag will 
be appended at this point, contrary to our assumption. □ 

2.5 Understanding Fanlts 

A history represents a possible computation by a group of processes in our model. Such a com¬ 
putation can be plagued by different types of stop faults - halting processes, dropped packets and 
dropped QMS notifications. Our goal is to demonstrate that by adding a small number of reasonable 
restrictions to the model, we will be able to analyze all histories in our model under the assumption 
that all faults are eliminated except for one simple fault - the simultaneous halt of all the processes 
in the group. This is a very desirable property, because it means that the whole group behaves the 
same way a single process would - albeit at a much higher performance and availability level. 

The fault simplihcations techniques that we propose all involve the use of timers, timeouts and 
voluntary process halts when timeouts occur. But we are going to keep our computational model 
asynchronous and avoid introducing the notion of time explicitly. We achieve this by introducing 
axiomatic correlations between certain faults. These correlations can be forced to occur through 
the use of timers and timeouts, but these implementation details are not part of the model itself. 

Let us start by looking at notifications. Suppose that P fails to dequeue a notification that P is 
entitled to. How can that happen? According to the Second Halting Axiom either some Vj{P) is 
dropped or P halts. So the blame can be assigned to P or to QMS. If we want to simplify the fault 
model and eliminate GMS faults, we must shift the blame to P. But this is not possible unless P 
halts. A reasonable implementation will not allow a process P to continue running indehnitely when 
GMS becomes unresponsive. At some point P will timeout and halt voluntarily. If that happens 
then we have a hope of shifting the blame to P and away from GMS. 

A similar observation can be made about packets. Suppose there is a packet k € P(^ that is queued 
by P but not dequeued by Q. How does that happen? According to the Third Halting Axiom either 
P halts (which opens the possibility that k is never sent), or Q halts (which opens the possibility 
that k is never dequeued even if it is received) or else fc - or some packet ahead of fc - is dropped. So 
the blame can be assigned to P, to Q or to the channel P<^. If we want to simplify the fault model 
and eliminate P<^ faults, we must shift the blame either to P or to Q. But this is not possible 
unless one of them halts. A reasonable implementation will not allow P and Q to continue running 
indehnitely when the channel between them becomes unresponsive. At some point one of them 
must be successfully evicted, and failing that, both of them will timeout at some point and halt 
voluntarily. If that happens then we have some hope of shifting the blame to P or Q and away 
from the channel between them. 

This built-in ambiguity of the boundary between a process and its channels and membership service 
allows us to reinterpret the root cause of visible faults. If a packet is not dequeued, we can shift 
the blame from the channel to the process and vice versa. If a notification is not dequeued we can 
shift the blame from the membership service to the process and vice versa. But there are limits 
to this blame shifting game. If a process halts, we cannot keep on blaming it forever. If other 
processes keep sending it packets indefinitely, then at some point the channels leading to the halted 
process must start dropping packets since there is no process on the target side to receive them. If 
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the membership service keeps attempting to inform the halted process of view changes, and never 
succeeds in evicting it, then its notifications will have to start dropping at some point. 

Well implemented processes will not, however, keep attempting to send packets to a process that 
has already halted a long time ago. A well implemented membership service will not keep a halted 
process as a member in view after view indefinitely. A well implemented process will monitor its 
channels and GMS and attempt to detect problems and remove them, and at some point it will have 
to give up and halt. 

If everyone behaves well enough then perhaps the blame for all the faults in the system can be 
shifted to the processes. 

The process faults themselves cannot be eliminated entirely, but they can be greatly simplified. 
When a process halts, there can be an arbitrary, finite number of unsent packets in its send queues, 
and an arbitrary finite number of packets, notifications and message broadcast requests in its receive 
queues. The halt can occur in the middle of the processing of a trigger event, when some of the 
side effect events have already occurred while others have not. This is very different than what one 
would expect to happen when a process is removed in an orderly fashion. In an orderly removal you 
would expect a notification to go to all the processes, including the removed process itself, and you 
would expect the group to enter a quiescent period during which all on going tasks are completed 
and all packets in flight are received and processed while the application execution is put on hold 
so it does not create any new work. Only once the group is fully quiescent will the removed process 
halt, the group reconfigure itself, and finally resume its normal work under the new configuration. 

We will demonstrate that under reasonable assumptions, a general process halt can be simplified to 
look like an orderly removal - with the aforementioned exception of a simultaneous halt of the whole 
group. This means that with this one exception, we can view any process halt as the consequence of 
a planned process removal, rather than as a failure that was the cause of a removal. In other words, 
we can make process failures go away entirely, with the one notable exception of a simultaneous 
system-wide failure. 

First of all we can assume that a halting process completes the execution of its final transaction 
before halting (see Section \2?I \ for a definition of transactions). This can affect its send queues and 
its internal state, but it does not affect any or the remaining processes as long as the packets that 
the transaction generates remain stuck in a send queue. Also as long as the newly generated packets 
are not sent out they do not adversely effect the perceived reliability of the outbound channels of 
the process. 

A further simplification can be achieved by emptying out the receive queues of the process. This is 
more tricky. The basic idea is pretty simple - just assume that the process dequeues and processes 
all the items in its receive queues before it halts. This can generate a lot of side effects, but as long 
as the side effects are stuck in their respective send queues, the surviving members of the group 
will not be affected. 

A final simplification can be achieved by emptying out some of the send queues. This is the most 
tricky part since it allows a process that is presumably already halted to affect other processes 
by sending them additional packets. In a sense the halted process can alter the information held 
by other processes and change history ’’from the grave”. This can be made acceptable if we limit 
ourselves to only sending packets bound to processes that appear to have halted even earlier. This 
should confine any new information to the world of the dead and prevent it from altering history. 
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There are three additional complications. 

The first and easiest complication is that the order of processing must comply with the Piggyback 
Axiom. 

The second and more pernicious complication is that the processing must comply with the Self 
Channel Axiom. The issue is that while most side effects can be left stuck in a send queue or sent 
’’downstream” to an already halted process, the latter axiom sometimes forces packets on the self 
channel to be sent, received and processed, thus generating more side effects, which themselves 
could result in more packets on the self channel, ad infinitum. We cannot allow that to happen 
because this violates the First Halting Axiom. This problem does not go away by itself. But it 
does not arise if PROTOCOL is implemented in a reasonable fashion and does not generate infinite 
loops for itself in the absence of external stimuli. If the protocol meets this vacuum convergence 
condition, then we can assume that a halting process leaves no unprocessed or partially processed 
items. 

The third and perhaps subtlest complication is that processes can give birth to child processes 
before they halt, so even if a packet is bound to a target that halts earlier than its source did, the 
information that is conveyed by that packet may live on inside a child of the target process - and 
the child could live indefinitely. This complication is bound together with a more general issue of 
race conditions that can arise when a process births a child. As with the previous complication, 
these race condition issues cannot be wished away. Instead we must require that PROTOCOL behave 
in a way that prevents race conditions from occurring. An additional but less critical simplifying 
requirement is that processes that fail to initialize must not be chosen by CMS to be the parents of 
child processes. 

These observations lead us to consider a sub-class of histories that arise from conforming models, 
which are models that assume well behaving processes; a well behaving membership service; and a 
well behaving PROTOCOL layer. We will see that for such histories it is possible to drastically simplify 
the faults that need to be considered. 


2.5.1 Conforming models and conforming histories 


We claimed that under some assumptions of ’’reasonableness” of the way the drivers and PROTOCOL 
are implemented in a model we have a hope of simplifying the behavior of faults. We are going 
to make this notion exact by introducing a number of new axioms that are less general than the 
axioms of 12. R. 41 but represent behaviors that would be expected from a reasonable implementation. 
As we mentioned before, implementing these reasonable behaviors requires timers, but the new 
axioms preserve asynchrony by describing the model simplihcations as mere correlations between 
faults 


We start with a few definitions. 

Definition 5. A stunted history is a history where the number of processes is finite and all of 
them halt. 

Definition 6. A finite channel is a channel where only a finite number of packets are dequeued. 
In other words a channel P(^ is finite z/ G e| 


< oo. 
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Conforming Channel Axiom 

If is finite, then either P is removed or Q halts. 


Conforming Packet Axiom 

A process P does not dequeue packets from process Q after a removal notification for Q is dequeued 
byP. 


Conforming Notification Axiom 

If any notification Vi{P) is dropped then P halts. 


Conforming GMS Axiom 

f a process halts then it has a finite view interval. 


Conforming Parent Axiom 

If a process is uninitialized then it has no child processes. 


Conforming Halt Axiom 

If a process is removed then it halts. 

Definition 7. A conforming history is a history that satisfies all the conforming axioms. 


2.5.2 Fault equivalence 
Definition 8. 

A downstream channel is a channel P^ where r{P) > r{Q) or P = Q. In other words it is 
either a self channel or else it is a channel whose target is removed from the group before its source 
is removed. A upstream channel is a channel P(^ where r{P) < r{Q) and P ^ Q. A upstream 
packet or downstream packet is a packet that belongs to an upstream or downstream channel, 
respectively. 

Definition 9. 

A vacuum event is an event e at a halting process that has no lasting effect on the history. To 
be precise, e is a vacuum event if its successor events 

{f\fhe} 

form a finite set, and all them occur at halting processes. 

A vacuum packet is a packet k with a vacuum queuing event k*^^. 

Obviously, all the successors of a vacuum event are vacuum events as well. 

Definition 10. Two histories are said to be fault equivalent if they are identical with the following 
exceptions: 

• packets that are sent in one need not be sent in the other. Packets that are received in one 
need not be received in the other. 
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• notifications that are dropped in one need not he dropped in the other. 

• the two histories have the same non-vaeuum events. 

• the two histories have the same non-vacuum packets. 

Definition 11. A history H is called lossless if 

• H has no dropped packets 

• H has no dropped notifications 

• All upstream channels clear their receive queues. In other words all received upstream packets 
are dequeued. 

• All downstream channels clear their send queues. In other words all queued downstream 
packets are sent. 


2.5.3 The Vacuum Loop and vacuum closure 

Definition 12. Let H he a lossless history and let P he any halting process in H. If j{P) > 0 let 
E he the parent of P and assume that exists. Look at P at the moment that it halts. 

Let V he the highest view processed by P, namely the value of the highest notification Vy{P) that 
was processed by P before it halted. By the Order Foundation Axiom such a maximal view always 
exists unless Ep = 0. In the latter case set v = i{P) — 1. 

Look at the following loop, called the vacuum loop; 

1. If the event Phlt exists at the end o/Ep 

• remove it 

• return the notification Vr( p) (P) to the GMS receive queue 

• set V = r{P) — 1 

2. Finish processing the current item. If any new packet multicasts are generated, append the 
respective packet queuing events at the end o/ Ep . Queue any new upstream packets to 
their respective send queues and declare them to be unsent and unreceived. Queue any new 
downstream packets to their respective receive queues and declare them to be sent and received. 


3- If V > j{P) and there are any requests in the APP receive queue, dequeue and process them 
one by one. For each request, append the generated request dequeuing event at the end of 
Ep. If the processing of a request creates any side effects, append the resulting packet queuing 
events at the end o/Ep. Queue any new upstream packets to their respective send queues 
and declare them to be unsent and unreceived. Queue any new downstream packets to their 
respective receive queues and declare them to be sent and received. 


4 . If V = r{P) — 1, dequeue and process all the packets in all the receive queues of channels 
where Q P. For each packet, append the corresponding dequeuing event to Ep; and process 
the packet. If the processing of a packet creates any side effects, append the resulting packet 
queuing events at the end o/Ep. Queue any new upstream packets to their respective send 
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queues and declare them to be unsent and unreceived. Queue any new downstream packets to 
their respective receive queues and declare them to be sent and received. 

5. If the receive queue of the self channel is empty, proceed to step (0). Otherwise, dequeue 
all the packets there. For each packet, append the corresponding dequeuing event to Ep; 
and process the packet. If the processing creates any side effects, append the resulting packet 
queuing events at the end o/Ep. Queue any new upstream packets to their respective send 
queues and declare them to be unsent and unreceived. Queue any new downstream packets to 
their respective receive queues and declare them to he sent and received. Repeat step until 
the receive queue of the self channel is empty. 

6 . Increment v 

• if v = r{P), append a Phlt event to Ep and exit. 

• dequeue the Vy{P) notification and process it. Append the Vy{P)^^ event to Ep. If 

v = jiP) > 0, add the order relation x Vy{P)^^. Notice that in this case 

v^iPr^ = Pkun. 

• If the processing of the notification creates any side effects, append the resulting packet 
queuing events at the end of Ep. Queue any new upstream packets to their respective 
send queues and declare them to be unsent and unreceived. Queue any new downstream 
packets to their respective receive queues and declare them to he sent and received. 

• go back to step m- 

Lemma 4. Let H be a lossless history and let P he a halting process in H. Assume that P is 
either a member of view zero, or else the parent of P processes the notification of the joining of P 
(vj(p)iE)^^ exists, where E is the parent of P). 

Suppose that the vacuum loop is run against the halting state of P for a finite number of steps. 
Then the resulting structure is a lossless history that is fault equivalent to H. Moreover, if the 
original history H is conforming then the extended history is conforming as well. 

Proof. We start by showing that the extended structure is a lossless history. Then we show that it 
is fault equivalent to the original history H. 

The proof that the extension of H satisfies the history axioms is by induction. Suppose that all the 
history axioms remain true, and the history remains lossless after going through a certain number 
of steps. We will show that the same remains true after running through one more step. For most 
axioms we do not actually need the inductive hypothesis - they are either trivially true or can be 
demonstrated directly. But there are a few exceptions. 

The Packet Event Axiom remains true because every new packet that we create comes with a 
queuing event per multicast. We only add a dequeuing event to packets that are already queued 
(steps dH) and ®). 

The Packet Order Axiom is violated if there are packets k, k' € such that ^ k''^^ and k'^^ 
exists and yet k^^ does not exist or does not precede k'^^. 

Suppose that k'^^ already existed prior to the current step. By induction the extension of H was 
a history at the conclusion of the last step and therefore k''^^ already existed and as a result the 
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preceding event had existed as well. Therefore by induction had also existed and preceded 
k'^^ and we are done. 

Assume therefore that k'^^ is generated in the current step. This implies Y = P. Moreover, the 
only steps that create packet dequeuing events are steps (H]) and ([5]). 

If k'^^ is generated in the current step then the packet k' is already received after the previous 
step. The induction hypothesis implies that k' is a downstream packet in this case, or else it would 
have been dequeued already due to losslessness (if we are at step (O then k' is downstream by 
definition). Therefore, again by induction, losslessness implies that the packet k must have been 
sent (and therefore received) before the current step. Since k was queued before k', it must have 
been dequeued before k' is dequeued. Therefore k^^ exists and precedes k'^^. 

The Notification Order Axiom remains true because step ([0]) dequeues notihcations in order and 
without gaps. 

The Parent Axiom remains true thanks to the conditions we imposed on P. 

The Process Liveness Axiom is a statement about the behavior of PROTOCOL callbacks, which are 
the only part of the model that generates queuing events. Since the vacuum loop processes triggers 
using the appropriate PROTOCOL callbacks, the axiom remains valid. 

Suppose that the Piggyback Axiom is violated. Then there is a packet k € that we dequeue in 
step 0 or © of the vacuum loop even though there is an i such that i < j{Q) or Ui(Q)'’'^ ^ k^'^ 
and yet i > j{P) and either Vi{P)^^ does not exist or it exists and k^^ -< Vi{P)^^. 

In the case of the self-channel (step (O) this can be easily seen to be absurd by substituting Q = P. 

In the case of step (jl]) we have by definition k^^ >- Vv{P)^^ and so it must be that i > v. Since 
V = r{P) — 1 in this case, we have i > r{P). So we either have r{P) < j{Q) or we have P 

-< k^^^. Either of these cases violate the Process Liveness Axiom which holds by induction 
prior to the current step. 

To see that the Self Channel Axiom remains true notice that the vacuum loop dequeues a notihcation 
(in step (IH|)) only after it verihes that all the packets on the self channel are processed (in step (O). 

The Order Foundation Axiom is mostly trivial except for the Prun and Prlt part. 

If Prun does not already exist then v = j{P) — 1 and there was no current item to finish processing 
when the vacuum loop started. By induction, Ep = 0. All the steps except step dH) become no-ops 
and no new events are added to Ep. To see that step is a no-op, notice that a packet can reside 
in the receive queue of the self-channel only if it was previously queued to the send queue. This 
would show up as a queuing event which is not possible in our case. 

Step © increments v to equal j{P) and adds a Vy[PY^ = Vj(^p){P)^^ = Prun event as the first 
element of Ep. This proves this case. 

Step 0 of the vacuum loop removes Prut from Ep and returns it to the GMS receive queue, if 
necessary. Step © can create or restore the Prlt event. If that happens then the loop exits, leaving 
Phut as the last event in Ep. 

All the other axioms are trivial to verify. For the Second Halting Axiom one just needs to remember 
that we are dealing here exclusively with lossless histories, so there are no dropped notifications. 
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We still have to show that executing each step of the vacuum loop preserves losslessness (see 
Definition ini). Since we start with a lossless history we have no dropped notifications. Every 
packet that is created by the vacuum loop is either unsent and unreceived (if it is an upstream 
packet) or sent and received (if it is a downstream packet), so the loop never creates a dropped 
packet and guarantees the losslessness conditions with regard to upstream and downstream channels 

We have to show that the extended history is fault equivalent to the original history H. This is 
more or less trivial by construction. All we did was add a finite number of new packets and events. 
All of the new events occur at the ’’end of history” in the sense that they do net precede any 
pre-existing events. All of the new events are added at P which is a halting process. Therefore by 
definition we only added vacuum events and vacuum packets. Any pre-existing event acquires at 
most a finite number of new successors and all of them occur at a halting process. Therefore all 
pre-existing vacuum events remain so in the extended history. 

Finally, assume that H is conforming. We must check that the extended history is still conforming. 

The vacuum loop does not change the set of removed processes. It does not change the set of 
halting processes. It does not change the view interval of any process. It adds only a finite number 
of dequeuing events to Ep, and it does not add any dequeuing events to any other process. Therefore 
the extended history satisfies the Conforming Channel Axiom, the Conforming GMS Axiom and the 
Conforming Halt Axiom. The Conforming Notification Axiom is vacuously true because a lossless 
history has no dropped notifications. 

The Conforming Parent Axiom holds because any uninitialized process in the extended history is 
uninitialized in the original history and the vacuum loop does not create any new parent/child 
relationships. 

The Conforming Packet Axiom holds for every process other than P because H is conforming. 
Suppose that P processes a removal notification of a process X ^ P. This implies that r{X) < r{P) 
and therefore the channel is upstream. By losslessness this implies that the receive queue of 
the channel is empty throughout the execution of the vacuum loop, and therefore the loop does not 
add any new dequeuing events for packets on that channel. 

If the processing of the removal notification of X occurs during the vacuum loop, then we have 
already shown that all the packets from X are already processed at this point. Since the vacuum 
loop does not add any new packets to the X-receive queue, the axiom holds in this case as well. 

If X = P the axiom holds because Phlt is the last event in Ep, according to the Order Foundation 
Axiom. □ 

Definition 13. A history extension of the type that is described in Lenima\^is called a vacuum 
continuation of H at P. If the vacuum loop at P terminates then there is a maximal vacuum 
continuation of H at P, called the vacuum closure of H at P. If the vacuum closure of H at P is 
equal to H (meaning that the vacuum loop did not generate any new events or packets) then we say 
that H is vacuum closed at P. It is trivial to check that the vacuum closure of H at P is vacuum 
closed at P. 

Corollary 4. Let H be a lossless history and let P be a halting process in H. If H is vacuum closed 
at P, then P processes all the packets that it receives and all the notifications that it is entitled to, 
from Prun to Phlt- 
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Proof. It is easy to see that the vacuum loop does not terminate as long as there is an unprocessed 
QMS notification that P is entitled to. This means that if P is vacuum closed then it must have 
processed Phlt- 

In order to reach the processing of Phlt, the vacuum loop must pass through steps ([4]) and ([5|) while 
V = r{P) — 1. This clears all the receive queues as required. □ 


2.5.4 Transactional histories and the Fault Theorem 

Definition 14. A transactional history is a conforming, lossless history that meets the following 
additional restrictions: 

1. All notifications are processed to completion. 

2. All message broadcast requests are processed to completion. 

3. All received packets are processed to completion. 

Definition 15. A protocol PROTOCOL is vacuum convergent if for any conforming model in 
which it participates and for any halting process in any lossless history of that model the vacuum 
loop terminates. 

We are now ready to state the principal finding of this part of the paper - the Fault Theorem. 
Theorem 1 (Fault Theorem). Let H be a conforming history, and assume that PROTOCOL is vacuum 
convergent. Then P[ can be extended to a fault equivalent transactional history tv {PI). tv{P[) is called 

the transactional closure of PI. 

We start by showing that drops can be eliminated, and then we move on to simplifying process 
faults. 

Lemma 5. If a packet k S PQ is dropped in a conforming history, then either P halts or Q halts. 

Proof. By the Third Halting Axiom, k^^ does not exist. By the Packet Order Axiom, for any 
k' £ P(^ with k''^^ >- k'^'^, does not exist either. Therefore, considering that Ep and Eq are 
linearly ordered, the only k' £ P(^ packets for which exists are the ones for which k''^'^ -< 
and since (by the Order Foundation Axiom) the -< relation is very well founded at P, there are 
only a finite number of such packets. Therefore the channel is finite. By the Conforming Channel 
Axiom, either P is removed or Q halts. In the former case, the Conforming Halt Axiom guarantees 
that P halts, and we are done. □ 

Lemma 6. In a non-stunted conforming history, a process halts if and only if it is removed. 

Proof. By the Conforming Halt Axiom we know that every removed process halts. To show the 
converse, suppose a process P halts but is not removed. By the Conforming QMS Axiom process P 
must have a finite view interval. Since P is not removed, r{P) = QJ and so 03 is finite. This implies 
that the number of processes is finite. Let Q be any process. Since P halts, the First Halting Axiom 
guarantees that the channel P(^ is finite. By the Conforming Channel Axiom, P is removed or Q 
halts. Since P is not removed, Q must halt. We conclude that there is a finite number of processes 
and all of them halt. In other words the history is stunted. □ 
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Lemma 7. In a conforming history, a halting process P only gueues a finite number of packets 
and only misses a finite number of packets and notifications: 


US? 

Qgp 


< oo 


Qgp 


< oo 


< * < 'f(L’) and Vi{Py^ ^ Ep}| < oo 


Proof. Let P be a halting process in a conforming history. By the First Halting Axiom, P has a 
finite event set. Therefore only a hnite number of packets are queued by P. This proves the first 
claim. 


By the Conforming GMS Axiom, a halting process in a conforming history has a hnite view interval. 
Therefore P is eligible only for a hnite number of view notihcations. This proves the third claim. 

The main difficulty is with the second claim. 


Let Q be a process. First we want to show that P misses a hnite number of packets from Q. 

If Q halts, then Q has a hnite event set, therefore Q queues a hnite number of packets to Q^, 
therefore P misses hnite number of packets from Q and we are done. So assume that Q does not 
halt. This implies, incidentally, that the history is not stunted. 


Since the history is conforming and not stunted, Lemma [S] implies that P is removed. 

If r{P) < j{Q) then the Process Liveness Axiom guarantees that Q^’ is empty and we are done. 
So we can assume that 

HQ) < r{P) < = r{Q) 


And therefore by the GMS Axiom the notihcation Vr{p){Q) exists. Since the history is conforming 
and Q does not halt, there are no dropped notihcations at Q (due to the Conforming Notihcation 
Axiom) and therefore Q must process the removal notihcation of P. 

By the Process Liveness Axiom, for every k € we have k^^'^ -< Vr(p)(Q)^^. By the Order 
Foundation Axiom, there is only a hnite number of events in Eg that precede r'r(p)(Q)'’^ 
therefore QI^ is a hnite channel. Therefore, P can only miss a hnite number of packets from Q. 


We have established that P misses a hnite number of packets from each source process. As long as 
only a hnite number of processes queue packets targeted at P, we are done. Since P has a hnite 
view interval, there is only a hnite number of processes Q with j{Q) < r{P). For any Q with 
j{Q) > r{P) we have already established that the channel Qp is empty. □ 

Theorem 2 (Lossless History Theorem). Every conforming history is fault equivalent to a lossless 
conforming history that contains the same packets and events. 


Proof. Let H he a, conforming history. We are going to change H into a fault equivalent lossless 
history by changing the faulting characteristics of notihcations and packets, without adding or 
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subtracting any vacuum events and packets. The conforming axioms (see 12.5.11) are not affected 
by such changes except for the Conforming Notification Axiom. However a lossless history has no 
dropped notifications, so this and all other conforming axioms are going to remain valid. 

We start with notifications by simply declaring that none of the notifications are dropped. There are 
two catches. First, we may have just added an infinite number of notifications into the notification 
queues of some processes. But it follows from the Conforming Notification Axiom that dropped 
notifications only exist at halting processes, and it follows from Lemma [7] that halting processes 
only have a finite number of dropped notifications, so this problem does not occur. The other catch 
is that we have to prove that H is still a history. There are several assertions in the Second Halting 
Axiom that are related to dropped notifications which we now have to verify. Suppose that some 
notification Vi{P) is dropped in H. 

• The Second Halting Axiom claims that if Vi{P) is a dropped notification then Vi{P)^^ does 
not exist. This assertion is not violated when we declare that Vi{P) is not dropped, so we are 
done in this case. 

• The same axiom claims that if P does not halt and Vi{P) is not dropped then Vi{PY^ does 
exist. However history H is conforming, and so by the Conforming Notification Axiom P 
must halt. Therefore this assertion is not violated either. 

We now move to packets. We make the following changes in the faulting characteristics of packets 
in the channel PQ: 

• We declare all the unprocessed upstream packets to be unsent and unreceived. 

• We declare all the unprocessed downstream packets to be sent and received. 

The same two catches apply here as well. By preventing packets from being sent we may saddle 
some processes with an infinite number of unsent packets. By forcing packets to be received without 
being processed we may be saddling some processes with an infinite number of received packets 
that linger in the process’ receive queues indefinitely. In addition, we have to verify all the relevant 
axioms. 

To see that we do not create an infinite number of packets that remain stuck in the send or receive 
queues of a process P, notice that if P halts then Lemma [7] guarantees that only a finite number of 
unprocessed packets exist in P’s incoming and outgoing channels and so we cannot create infinities 
at P. If P does not halt then the situation is a little bit more complex and we have to look at the 
send queues and receive queues of P separately. 

It follows from the Conforming Halt Axiom that P is not removed and therefore r{P) = QJ. As 
a result most of the outgoing channels of P are downstream. Since we force all the unprocessed 
downstream packets to be sent we do not create any unsent packets on these outgoing channels. 
The exceptions the are channels P(^ that lead to some other process Q that is not removed. Lemma 
[^implies that Q does not halt. The Conforming Channel Axiom implies that P^ is not finite and 
therefore the Packet Order Axiom implies that all the packets on the channel are processed and as 
a result we do not change the faulting characteristics of any of packets on these channels. 

On the other hand the incoming channels of P are all upstream, with the exception of the self 
channel Pp. Since we force all the unprocessed upstream packets to be unsent, we do not create 
any received-and-unprocessed packets on these channels. We have already seen that when both 
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ends of a channel do not halt, all the packets on the channel are processed. Therefore P processes 
all of the packets on its self channel and as a result we do not change the faulting characteristics of 
any self channel packets. 

As far as axioms go, the only axiom that is related to the fault properties of packets is the Third 
Halting Axiom. The first part of this axiom claims that a non-halting process sends all of its queued 
packets. This part is not violated by our changes because we only declare a packet to be unsent if 
it emanates from a removed process P. By the Conforming Halt Axiom the process P halts. 

The second part of the axiom claims that a packet is not processed unless it is received. We only 
declare a packet to be unreceived if it is not processed. Therefore this part of the axiom is not 
violated. 

The third and last part of the axiom claims that packets keep getting processed as long as there is 
no impediment such as a halting source or target, or a previous unreceived packet. Suppose P and 
Q do not halt. The the Third Halting Axiom implies that all the packets in P(^ are sent. Lemma 
[S] implies that all the packets are received and as a result the Third Halting Axiom implies that all 
the packets in the channel are processed. Therefore we do not touch any of these packets and the 
third part of the axiom remains valid. 

We have to show that the revised history is lossless (see Definition El). This follows directly from 
our construction. We obviously do not have any dropped packets or notifications anymore. As 
for channels, since we declared all the unprocessed packets in upstream channels to be unsent and 
unreceived we have cleared all the receive queues of these channels. Similarly, since we declared all 
the unprocessed packets in downstream channels to be sent and received, and since all the processed 
packets must have been sent to begin with, we have cleared all the send queues of these channels, 
as required. 

Our last task is to show that the new history is fault equivalent to the original history. But this is 
trivial since we did not add or subtract any events. □ 

Proof of the Fault Theorem. Theorem [2] established that the conforming history PI is fault equiv¬ 
alent to a lossless conforming history Hi. In a two step process, we will improve Hi to a fault 
equivalent transactional history H^. The intermediate histories will have the following properties: 

• Hi is conforming and lossless. 

• H 2 is conforming and lossless. In addition, all processes in H 2 are initialized and process all 
of their notifications. 

• iJa is transactional. 

The intermediate history H 2 is constructed by running the vacuum loop to completion at each 
halting process. As one might expect, there are some complications. 

The first complication is that for Lemma 0] to apply at a process P, we must make sure that its 
donor E, if it exists, had dequeued the join notification of P. This can be guaranteed by traversing 
the processes of H by increasing join view. Since j{P) > j{E), this order guarantees that we run the 
vacuum loop at E before we run it at P. The vacuum convergence property of PROTOCOL together 
with Corollary 0] guarantee that E will process the join notification of P before we run the vacuum 
loop at P. 
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The second complication is that we may be running the vacuum loop an infinite number of times. 
This only happens if Hi is not stunted, in which case Lemma [S] implies that every halting process 
is removed. At each finite step Lemma 0] guarantees that the resulting structure is a conforming, 
lossless history that is fault equivalent to the original history H. But we have to show that all of 
these properties are preserved at the limit. This is mostly but not entirely trivial. 

The limiting structure H 2 is a conforming history because almost all the history axioms and con¬ 
forming history axioms either deal with views and view intervals, or with the events at a single 
process, or with events at a pair of processes, or with packets in a single channel. Any such axiom is 
either not affected by the vacuum loop at all, or is affected only by a finite number of applications 
of the vacuum loop in our infinite sequence. Therefore all of these axioms follow immediately from 
Lemma m The only exception is the Minimal Order Axiom, but this axiom is ’’continuous” in the 
sense that it naturally commutes with limits. This is because each order relationship that exists at 
the limit already exists after a finite number of applications of the vacuum loop. 

H 2 is lossless for the same reason: the requirement that there be no dropped packets or notihcations 
is continuous - it is fulfilled at the limit if it is fulfilled at each step. The requirements on upstream 
and downstream channels affect one channel at a time and each channel is affected only by the 
execution of the vacuum loop at the source and target of the channel. 

The non-trivial part is in showing that H 2 is fault equivalent to Hi. The argument in Lemma H] 
was that the vacuum loop does not create any new non-vacuum events because it only introduces a 
finite number of events, all of which are vacuum events themselves. Obviously a finiteness argument 
of this sort cannot simply be carried over to the limit. Instead the fault equivalence of Hi and H 2 
arises from deeper roots. 

Let e be any event in H 2 that has an infinite number of successor events in H 2 ■ The fault equivalence 
claim will follow if we can show that e already has an infinite number of successors in Hi. From 
Konig’s Lemma it follows that there is an infinite increasing sequence 

B = bi :< b2 bs ^ . 

of successors of e in H 2 , and we can assume that B contains no gaps, meaning that any two 
consecutive pair of events in bi ^ in i? is a primitive relation, meaning that either 

• bi and 6^+1 are adjacent events in a process P. 

• There is a packet k such that bi = and bi+i = k^^. 

• There is a parent/child pair of processes E/J such that bi = Vj(^j){EY^ and bi+i = Jrun 

All we need to show is that B cannot be made up exclusively of events that are outside of iLi, 
namely events that are added by the extension process. We demonstrate that through a sequence 
of claims. We assume that B is made up by events that are added by the vacuum loops and use 
the notation P{b) to indicate the process that event b occurs at. We reach contradiction through a 
sequence of claims. 

Claim: The sequence P{bi), P(b 2 ), ■ ■ ■ contains an infinite number of different processes. 

If only a hnite number of processes appear in the sequence then there is some process Poo that 
appears an infinite number of times in the sequence. But that means that the sequence B contains 
an infinite number of events at Poo all of which are added, by our assumption, by the vacuum loop 
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at Poo • This contradicts our assumption that the vacuum loop completes in a finite number of steps 
at Poo • 

Claim: The sequence B must contain an infinite number of parent/child type pairs ^ 

Trun • 

If not, then we can remove an initial segment of B so that it does not contain any such pair. This 
would leave us only with consecutive pairs that occur at the same process or pairs of the type 
-< It follows from Lemma |T] that the vacuum loop only creates events for downstream 
packets. It follows that for all i, r(P(6i+i)) < r{P{bi)). As a result the sequence B can only involve 
a finite number of processes, contradicting the first claim. 

Claim: If B contains a pair k'^^ -< k^^, then k is a packet on a self-channel. 

Suppose that B contains an event bi = k^^ where k is not on a self-channel. If follows from the 
previous claim that there is a parent/child pair later on in the sequence. Therefore there is a 
parent/child pair E/ J and a segment in B of the form 

^fi<f 2 <---<fN<v,(j/EY^ 

where N > Q; all the events occur at P; and k is not on the self-channel ~E~t. Because k is not 
on the self-channel, the event k^^ must be generated by step (jl]) of the vacuum loop, which means 
that it is generated while v = r{E) — 1. This means that the next (and last) notification event that 
the vacuum loop creates at E is Prlt and not Vj(^j/EY^. 

Now we are ready to draw a contradiction. The sequence B contains an infinite number of par¬ 
ent/child pairs and all other pairs in B are local to a process. Therefore there is a segment in B of 
the form 

— '^auN < fl -< f 2 < ■ ■ ■ fN -< Vj(K){JY^ ATrun 

where N > 0; E/J and J/K are parent/child pairs; and all the events fi, f 2 , ■ ■ ■, fN occur at 
J. It follows that the process J is uninitialized in Hi, since the Jrun event is generated by the 
vacuum loop at J. By the Conforming Parent Axiom the process J cannot have a child process K. 
Contradiction. 

We are not done yet. We have constructed a conforming, lossless history H 2 that is fault equivalent 
to Hi and which has very good properties. H 2 has no uninitialized processes and according to 
CorollaryUlall the notifications and APP message broadcast requests in H 2 are processed to comple¬ 
tion. But H 2 is not transactional. The reason is that the vacuum loop at each process may create 
unprocessed packets in the receive queues of non-self downstream channels. Since we generate H 2 
from Hi using a single pass over all the halting processes, these packets may never be revisited. 

To make sure that we deal with these packets, we perform a second pass, the same way we performed 
the first pass. As we have already shown, this procedure creates a conforming, lossless history H^ 
that is fault equivalent to H 2 . However unlike the previous pass, all the processes in H 2 have 
already processed all of their notifications. Therefore for every downstream non-self channel P(^ 
the process P already knows that Q is removed. It now follows from the Process Liveness Axiom 
that P does not queue any new packets to the channel during the vacuum loop. As a result H 3 is 
transactional. □ 
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3 


The CBCAST Algorithm 


3.1 Introduction 

The algorithm we present here is based on the outline in [S]. Our description is more detailed than 
the authors’, but the algorithm itself is a bare bones version of the original. Our prime motivation 
here is to create a description that is easy to check for correctness. Therefore there is no attempt 
to accommodate frills like allowing multiple clusters, or optimizing time, space or communication 
complexity. For such considerations refer back to [5]. 

From this point on we consider our model to be conforming and so all the histories that we analyze 
are assumed to be conforming. The properties of CBCAST as presented here do not necessarily hold 
for non-conforming models. 


3.2 Terminology 

3.2.1 Messages and delivery 

As mentioned in the introduction, we use the term messages to refer to the objects being broadcast 
between processes with the expectation of virtually synchronous delivery. The CBCAST algorithm 
implements these broadcasts using the underlying multicast of point-to-point packets. When a 
packet is dequeued at a process and found to contain a message, the message is not immediately 
delivered in order to preserve causality constraints. Following [^, we say that the message is received 
once the process dequeues its packet up from the receive queue of the channel, but the message 
within it is delivered only at the moment when doing so is consistent with causality constraints. The 
act of delivery is implemented by invoking the ApplyMessage callback, which applies the message to 
the user’s replicated data (see [1331 below). Every message carries with it three pieces of metadata, 
denoted ORiG(msg), viEw(msg) and VT(msg) and describing the originator, view and vector time 
of the message respectively. These notions are explained below. 


3.2.2 Views, installations and view gaps 

The membership service sends a coherent stream of membership change notifications to member 
processes, as described in the previous section. This creates a natural sequence of membership 
views, starting at view(O) and progressing through view(l), view(2), etc. Each view is a finite 
set of processes, and view(n) is computed from view(?T, — 1) by taking the notification of the 
membership service and applying it to the earlier view, either by adding or removing a process. 
Each process keeps track of the view notihcations as they arrive from the membership service, and 
then attempts to install them. Installation involves waiting a while for packets in flight to arrive at 
their destinations. This flushing procedure is at the heart of the algorithm and is necessary in order 
to guarantee virtually synchronous delivery across views. As a result of this wait, a process may 
be several views behind as old installations are delayed and new view notihcations keep arriving. 
This gap is referred to as the view gap. If a process has received a notihcation of a new view, we 
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say that the process is aware of the view, regardless of whether the process has already installed 
the view. 

It should be noted that a process that leaves the group never re-joins. Even if a user re-starts a 
process, from the point of view of the membership service the re-started process is brand new. 


3.2.3 Instability and forwarding 

A major difficulty in designing a coherent broadcast protocol is that it must make broadcasts look 
like atomic operations, where in reality message packets are sent to each target process individually 
and are received (or fail to be received) individually. Creating the perception of atomicity requires 
careful bookkeeping of the progress of each message by the sender and by each target. 

In order to broadcast a message to a set of members of the currently installed view, a sender process 
creates, for each target, a packet containing the message and then sends the packet through the 
appropriate channel. The sender process tries to keep track of the arrival of the packets. To do 
that it creates a set, called the instability set, that initially contains the identities of all the target 
processes. When the sender receives an acknowledgment of receipt from a targefH, it removes that 
target from the instability set of the message. Likewise, if the group membership service notifies the 
sender that a target has been removed from the group, that target is removed from the instability 
set. A message with a non-empty instability set is called unstable. The sender keeps copies of all 
the unstable messages in a wait set. If the instability set becomes empty, the message is said to 
have become stable and is removed from the wait set. 

When a process receives a packet containing a message, it keeps a copy of the message in a receive 
set where it waits to be delivered. In addition the receiver has a responsibility to help the sender 
propagate the message to its intended target set. For that purpose the receiver keeps a copy of the 
received message in a forwarding queue. If the receiver learns of the removal of the sender it takes 
over its duties by re-broadcasting all the messages in the forwarding queue that were received from 
that sender. 

In a practical implementation the receiver tries to keep track of the stabilization of each received 
message, just like the sender does. Once a message stabilizes there is no more need to help propagate 
it, even if its sender is removed. Good bookkeeping is essential for keeping forwarding sets small and 
communication costs low. In our simplified implementation we do not keep track of stabilization 
on the receiver side, except for the very rudimentary measure of removing obsolete messages from 
the forwarding queues. 

Due to this forwarding mechanism we must differentiate between the originator of a message, which 
is the process that originally broadcast a message, and the packet sender which is the process that 
happened to send the packet containing the message. Usually these two are the same, but in the 
presence of process removals, a message may be carried from originator to target through a series of 
forwarded packets. The forwarding procedure naturally leads to duplicated deliveries. The receiving 
process removes the duplicates using the vector time (see below). There are simple ways to reduce 

^In our implementation, acknowledgments are received directly from the target - one acknowledgment per packet. 
In more practical implementations with lower communication costs, fewer acknowledgments are used and the sender 
may have other, more indirect methods of deducing that a packet has been received. 
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the number of duplicates and thus reduce the communication cost that is involved in forwarding. 
We do not include these here for the sake of simplicity. 


3.2.4 Vector time 

Each process keeps track of causality relations between messages using a vector of natural numbers, 
indexed by member processes, called the vector time. At each coordinate, the vector time contains 
the serial number of the latest delivered message that originated at the process that corresponds 
to that coordinate. The vector time is reset to zero every time a new view is installed. When a 
message is originally broadcast by a process the vector time is incremented (at the originator’s own 
coordinate) and the metadata of the message - namely ORiG(msg), viEw(msg) and VT(msg) - are 
then set to the id of the process, the currently installed view and current vector time of the process, 
respectively. These values remain fixed for the lifetime of the message. For a detailed discussion, 
including proofs, of how the vector time is used to guarantee causality order preservation within 
each view, see [5] . 


3.2.5 Cluster Initialization and original processes 

Birman et al ([5]) assume that the cluster starts at view 1 with a single member. We have to relax 
this assumption because a central tool in our analysis of CBCAST is the History Reduction Mapping 
(see Section that moves process joins back to the initial view. Therefore we allow an arbitrary 
finite number of processes to belong to that view (view zero in our exposition). We call these the 
original processes. These processes get started through an invocation of the [protStarT] procedure. 
We assume that the procedure gets called at each original process at exactly the same time and 
with the same roster of members that includes exactly the set of original processes. While the 
assumption of simultaneity is not realistic, we only need to use it with theoretical ’’reduced” cases 
and not with actual clusters which can still be assumed to start with a single member. 

3.3 Outline of the Algorithm 

Each process, from the moment it joins the group to the moment it leaves, keeps track of the 
views as notifications are received from the membership service. For each view from the currently 
installed view to the most recently announced view, the process keeps a list of the members of that 
view. 

Usually, when the process needs to broadcast a message to the group, it fixes the metadata of the 
message with the current view and vector time, and then sends a packet containing the message 
to each member of the current view. The process also places the message in a wait set, where it 
tracks its stabilization as the recipient processes acknowledge the message. However, when there is 
a view gap (i.e. there are announced views that have not been installed yet) the process refrains 
from broadcasting messages or fixing their metadata, and it queues them instead. The messages 
are broadcast whenever the view gap closes. 

When a process receives a packet containing a message, it acknowledges its receipt to the sender 
(the sender, remember, may be different from the originator). If the message is not a duplicate it 
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is placed in the receive set until it becomes deliverable and a copy of it is created and appended to 
the tail of the forwarding queue of the sender of the message. When a message becomes deliverable 
it is removed from the receive set and applied to the user’s replicated data. At the time of delivery 
the process updates its own vector time by incrementing the coordinate that corresponds to the 
originator of the message. 

When a process is notified that another member process has been removed, it takes each message 
in the forwarding queue of that process and forwards it to all the live processes. These messages 
are forwarded with their original metadata (originator, view and vector time) unchanged. A copy 
of each message is placed in the wait set, to await stabilization. 

Whenever a view gap exists, such as after a new view notification is received, the process must 
wait for its wait set to empty out before it can install the next view. Once the wait set becomes 
empty, the process sends a flush packet to all the live processes. This packet contains the value of 
the latest view known to the process (i.e. the current view plus the view gap). The process then 
waits to receive similar flush packets from all the live processes. Once that happens, the process 
installs the next view. It applies a view installation notification to the user’s replicated data and 
removes any obsolete messages from the receive set and the forwarding queue. 

Our implementation contains, in addition to flush packets, a related type of packet called a ghost 
packet. These packets are not necessary in a practical implementation. We use them to facilitate 
our reasoning about joining processes. A ghost packet is the "ghost” of a flush packet that would 
have been sent by a child of an existing process, had that child already been born. 

When a new process joins the group, it must somehow synchronize its state with the state of the 
existing processes. This is done in two stages. Initially the new process starts life as a perfect 
replica of an existing process, the parent. We do not describe how this is done, and subsume it into 
the opaque membership service. In addition the new process must compensate for the natural race 
conditions that occur as a result of the fact that packets have been in flight between its parent and 
the other members of the group at the moment that it is born. This compensation is performed using 
the donation protocol, whereby each existing process other than the parent exchanges instability 
information with the new process. 


3.4 Variable and Function Definitions 

3.4.1 Global variables - the state of a process 

cur_view The number of the current view. 

v_gap The number of yet-uninstalled views of which we have been notified by the membership 
service. 

self Local process identifier. 

MSet The set of identifiers of the member processes of the current view. 

PendViewQueue A queue of pending view changes. Each view change is either a joining of a new 
process or the removal of an existing process. 
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LiveSet The set of identifiers of all the live processes. This includes every process that is a member 
of the current view or a known future view, excluding all known removed processes. 

ContactSet A subset of LiveSet that excludes all the processes that joined before the local process 
but from which a donation has not yet been received. 

vt[] A vector of natural numbers, indexed by the process identifiers of the members of the current 
view. This is the vector time of the local process. 

ReceiveSet The set of non-duplicate messages that were received and not yet discarded or delivered. 

FwdQueue[] A vector of message queues, indexed by process identifiers. For each process identifier, 
the queue contains copies of all the messages that were sent from and acknowledged to that 
process - excluding duplicates - in the order they were received. The queue includes messages 
that were merely forwarded by the sending process and did not originate from it. Each queue 
includes both delivered and undelivered messages. 

WaitSet The set of all the messages that the process broadcast or forwarded during the current 
view (note that forwarded messages may have a viEw(msg) of a higher view even if they are 
forwarded during the current view). Each message in the wait set is paired with an index and 
an instability set. The index indicates how many messages were broadcast or forwarded out 
of the process prior to the current message. The instability set contains, for every process 
that has not yet acknowledged the message, an index that indicates how many broadcast and 
forwarded messages were received from that process prior to the broadcasting or forwarding 
of the current message. WaitSet is organized as the union of two data structures: 

1. BcastWaitSet contains only the messages that were broadcast by the current process. 

2. FwdWaitSet contains only the messages that were forwarded by the current process. 

LaunchQueue A queue of all the unsent messages that need to be broadcast once the view gap 
closes. 

Replicated Data An opaque object containing the replicated user data. This data is managed by 
the user application in an application-specific way, subject to the rules listed in subsection 

ghost_height A number indicating the highest ghost value sent by the process so far. Ghost values 
are sent out in a strictly increasing sequence. 

flush_height A number indicating the highest flush value sent by the process so far. Flush values 
are sent out in a strictly increasing sequence. 

ghost [] A vector of view numbers, indexed by process identifiers. It keeps, for each process, the 
highest ghost value that was received from that process. 

flush[] A vector of view numbers, indexed by process identifiers. It keeps, for each process, the 
highest flush value that was received from that process. 

mpkt-Out A counter of outbound messages, made up of two fields: 

• mpkt_out.b is the number of original messages broadcast by the process up to this point. 

• mpkt_out.f is the number of messages forwarded by the process up to this point. 
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mpktJn[] A vector of number pairs, indexed by process identifiers, that counts how many messages 
have been received from each process so far. The two fields are: 

• mpktJn[P].b is the number of original P-messages received from P. 

• mpktJn[P].f is the number of forwarded messages received from P 

3.4.2 Packet and notification types 

njoiN{pid, P-pid) Notification that a new process with identifier pid joined the group as a clone of 
the parent process with identifier p_pid. 

(pid) Notification that current member process with identifier pid was removed from the group. 
Pmsg('^®§) ^ message packet carrying a message msg. 

Pack('^®s) acknowledgement packet carrying an acknowledgment of receipt of message msg. 
Pghost('') a ghost packet indicating ghost value v. 

Pplush('') a flush packet indicating flush up to view v. 

Pghost(> v) or Pplush(> v) Stand for any packet Pghost(v') or Pplush(v’) where v’ > v. 

Pdonate (donation) A donation packet containing a donation of instability information from an 
existing process to a newly joined process. 

Pco-DONATE(condonation) A co-donation packet containing a donation of instability information 
from a newly joined process to an existing process. 

3.4.3 Message metadata 

Each message is fixed with three pieces of metadata 

ORiG(msg) Originator, namely the process that broadcast the message originally. 

viEw(msg) View, which is the view of the originator when the message is broadcast. 

VT(msg) Vector Time, which is (roughly) the vector time of the originator when the message is 
broadcast. 

The pair (viEw(msg), VT(msg)) uniquely identifies the message msg. 
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3.5 Detailed CBCAST Algorithm Pseudo Code 


Our pseudo code implements the various PROTOCOL interfaces lsee l2.R.21l 

protBroadcast(m) at page|44] 
protStart(roster, P) at page 1151 
protRun(P) at page SB] 

protRemove(P) at pagellTl 
protJoin(P, E) at pageSSI 
protPacket(k, S) at pagel49l 

The protPacket(k, S) interface implementation uses the following procedures to process the various 
types of packets that are defined in the CBCAST protocol: 

ReceiveMessage (m sg, se n d e r) at page EBj 

ReceiveAck (m sg, se n d e r) at page EBj 

ReceiveGhost(view, sender) at pagejBT] 

ReceivefFlush(view, sender) at page [51] 

ReceiveDonation(donation, sender) at page 15^ 

ReceiveCoDonation(co_donation,sender) at pagejBBj 

In addition there are three service routines that are called from several places that deal with view 
installation and message delivery: 

CheckFlush() at pagelBH 
TryToInstallQ at page 1551 
Scan() at pagejBBj 


Interface protBroadcast(msg) 

Input: msg is the message that is being broadcast 
if v_gap > 0 then 

1 append msg to the tail of LaunchQueue; 
end 

else 

increment mpkt-out.b; 

// calculate the message vector time 
let vf = vt; 

let vt’lselfl = vt[self\ + mpkt.out.b — mpktJn[seH].b; 

// fix message metadata before broadcasting 
ORiG(msg) <t= self; 
viEw(msg) <= cur_view; 

VT(msg) <t= vf; 

2 queue Pj^gp(msg) to ContactSet; // multicast the message packets 

let index = mp/cf.out; // locates msg in the outgoing message sequence 
let iset[] = mpktJnW; // the initial instability set for msg 
add (msg, index, iset[]) to BcastWaitSet; 
end 
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Interface protStart(roster, pid) 

Input: roster is the set of original members of the group (view zero members), pid is the 
process identifier of the local process 

GroundState(); // create the initial value of ReplicatedData 
let cureview — 0; 
let v.gap = 0; 
let self — pid; 
let MSet = roster; 
let PendViewQueue = 0; 
let LiveSet = ContactSet = roster; 
let vt = 0; 
let ReceiveSet = 0; 
let FwdQueue = 0; 
let WaitSet = 0; 
let LaunchQueue = 0; 
let ghost_height = flush_height = 0; 
let ghostW = flush[] = 0; 
let mpkt.out = {/ = 0; 6 = 0}; 
let mpktJnW = 0; 
foreach id € roster do 
create vt[id] = 0 ; 
create FwdQueue[id] = 0; 
create mpktJn[\d] = {/ = 0; 6 = 0}; 
create g'/70st[id] = flush[\d] = 0; 

Apply Join(id); 
end 

//We launch the main APP thread asynchronously 

//It will start executing at some indeterminate point in the future 
execute Main(se/f); 


45 





Interface protRun(pid) 

Input: pid is the process identifier of the new process 
increment v_gap; 

append (JOIN, pid) to the tail of PendViewQueue; 

add pid to LiveSet; 

let ContactSet = {pid}; 

create FwdQueue[pid] = 0; 

let BcastWaitSet = 0; 

foreach {msg, index, /set[]) £ FwdWaitSet do 
let index.6 = 0; 
end 

let LaunchQueue = 0; 

let flush_height = ghosFheight; 

create g/70st[pid] = ghost_height; 

create flush[p\d] = ghosFheight; 

let mpkt.out.b = 0; 

create mpktJn[p\d] = mpkt_out; 

let self= pid; 

EheckFEihl;); 
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Interface protRemove(rem_proc) 

Input: rem_proc is the identifier of the removed process 

1 increment v_gap; 

2 append (REMOVE, rem_proc) to the tail of PendViewQueue; 
remove rem_proc from LiveSet; 

remove rem_proc from ContactSet; 
foreach {msg, index, iset) G WaitSet do 
discard iset[rem_proc]; 
if iset = 0 then 

remove (msg, index, iset) from WaitSet; // message is stable 
end 
end 

discard mp/cE/n [rem_proc]; 

3 while FwdQueue[rem_proc] ^ 0 do 

pop msg from the head of FwdQueue[rem_proc]; 

// create an instability set that contains 
// all the live processes 
increment mpkt_out.f; 

4 let index = mpkt_out; 
let iset[] = mpktJnW; 

5 queue p„gp(msg) to ContactSet; // multicast the message packets 

6 add (msg, index, iset[]) to FwdWaitSet; 
end 

discard FwdQueue[rem_proc]; 
discard g/70st[rem_proc]; 
discard /7L/s/7[rem_proc]; 
lOhcckFEihO : 
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Interface protJoin(jn_proc, p_proc) 

Input: jn_proc is the identifier of the joining process. p_proc is the identifier of the parent 
process. 

increment v_gap; 

append (JOIN,Jn_proc) to the tail of PendViewQueue; 
add Jn_proc to LiveSet; 
add Jn_proc to ContactSet; 
create FwdQueue[jn_proc] = 0; 
foreach {msg, index, /set[]) S WaitSet do 
if iset[p_proc] exists then 

1 create iset[Jn_proc] = {/ = iset[p_proc]./; b = 0}; 

end 
end 

create g'/70st[Jn_proc] = g/70st[p_proc]; 

create f/ushp n_proc] = g/ 70 St[p_proc] ; // The received flush value of the new process 

is inherited from the received ghost height of the parent 

create mp/(t_/n[Jn_proc] = {/ = mpktJn[p_proc].f-,b = 0}; 

let donation = (WaitSet, mpktJnW, ghost.height, flush_height)-, 

queue (donation) toJn_proc; 

ICheckFljj^ h 
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Interface protPacket(fc, sender) 

Input: k is the packet being received, sender is the process identifier of the sender of the 
packet 

switch cont{k) do 
ca se p^^^jmsg): 

|ReceiveMessage[ msg, sender); 
endsw 

ca se p^^^jmsg ): 

IR.ecciveAckr msgf. sender); 
endsw 

ca se p^^^s^iview ): 

IReceiveGhostl view. sender); 
endsw 

ca se Ppi^„sH{wew ); 

IR.eceiveFlushr view. sender); 
endsw 

ca se p^^^^^^idonatio n): 

IRecciveDonationT donation. sender); 
endsw 

ca se p^^_^^^^^^{co.don ation) : 

IReceiveCoDonationf co_donation. sender); 
endsw 
endsw 
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Procedure ReceiveMessage(msg, sender) 

Input: msg is the message being received, sender is the process identifier of the sender of the 
message packet 

queue PACK(rnsg) to sender; // acknowledge receipt of the message 
if ORiG(msg') = sender then 

increment mpktJn[sender].b; 
end 
else 

increment mpktJn[sender].f-, 
end 

1 // Check for duplicates: 
if viEw(n7sg) < cur_view then 

2 discard p^jg(,(msg); // obsolete messages are duplicates tLemma l34D 
end 

else if viEw(n7sg) = cur.view and \/t[oRiG( msg)] > VT(msg)[ORiG(msg)] then 
discard p„gj,(msg); // duplicate - message already delivered 
end 

else if msg G ReceiveSet then 

discard Pj^gj,(msg); // duplicate - message already received 
end 
else 

3 add msg to ReceiveSet; 

4 append msg to the tail of FwdQueue[sender]; 

5 IScanf ): // scan ReceiveSet and deliver all the deliverable messages 
end 


Procedure ReceiveAck(msg, sender) 

Input: msg is the message being acknowledged, sender is the process identifier of the sender 
of the acknowledgement packet 

// the following if statement will always succeed 
if (msg, index, iset) G WaitSet exists then 
discard iset [sender]; 
if iset — 0 then 

remove (msg, index, iset) from WaitSet; // message is stable 
IClhcckFlushn : 
end 
end 
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Procedure ReceiveGhost(view, sender) 

Input: view is the ghost height of the sender, sender is the process identifier of the sender of 
the packet 

let g'/70st[sender] = view;// g'host [sender] always increases 


Procedure ReceiveFlush(view, sender) 

Input: view is the flush height of the sender, sender is the process identifier of the sender of 
the packet 

let /7L/s/7[sender] = view;// ffus/? [sender] always increases 
ITryToInstallD ); 
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Procedure ReceiveDonation(donation, sender) 

Input: donation is the donation being received, sender is the process identifier of the sender 
of the donation packet 

add sender to ContactSet; 

let co_donation = (WaitSet, mpktJn[], ghost_height, flush.height)-, 

queue ^^^^.^.^(co.donation) to sender; // Co-donate local state to the sender 

// Process, in order, all the untimely packets 

let UNTg = {(msg, index, iset[]) S WaitSet | iset[sender] exists}; 

Define heighti((msg, index, iset[]) S UNTg) = index.6 + index./; 

Define height 2 ((msg, index, iset[]) G UNT^) = 0; 

let UNTp = {(msg, index, iset[]) G donation.WaitSet | iset[se// exists}; 

Define heighti((msg, index, iset[]) G UNTp) = iset[se/f|.6 + iset[se//|./; 

Define height 2 ((msg, index, iset[]) G UNTp) = index.6 + index./; 
let UNT = UNTg IJ UNTp; 

sort UNT using the lexicographical order (heighti, height 2 ); 

//we process the elements of UNT in order 
foreach {msg, index, /set[]) G UNT do 
if {msg, index, /set[]) G UNTp then 

if index.b+ index./ > mpktJn[sendei].b + mpktJn[sender].f then 

//we found an untimely message packet from the sender to the parent, 
and we process its clone now 

1 |ReceiveMessage[ msg, sender); 
end 

end 

if {msg, index, /set[]) G UNTg then 

if index.b+ index./ < donation.mpktJn[seH].b + donation.mpktJn[seH]./ then 

// ve found a message packet from the parent whose acknowledgement 
packet was untimely, so we process its clone now 

2 IReceiveAckl msg. sender): 
end 

end 

end 

let g'/70st[sender] = donaUon.ghosTheight; 
let f/us/7[sender] = donaUon. flush_height; 
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Procedure ReceiveCoDonation(co_donation, sender) 

Input: co_donation is the co-donation being received, sender is the process identifier of the 
sender of the co-donation packet 

// Process, in order, all the untimely and post-critical packets 
let UNTg = {(msg, index, iset[]) S co_donation.WaitSet | iset[se/f| exists}; 

Define heighti((msg, index, iset[]) S UNTg) = iset[se/^.6 -I- iset[se//]./; 

Define height 2 ((msg, index, iset[]) S UNTg) = index.6-|-index./; 
let UNTp = {(msg, index, iset[]) S WaitSet | iset[sender] exists}; 

Define heighti((msg, index, iset[]) S UNTp) = index.6-|-index./; 

Define height 2 ((msg, index, iset[]) G UNTp) = 0; 
let UNT = UNTg U UNTp; 

sort UNT using the lexicographical order (heighti, height 2 ); 

//we process the elements of UNT in order 
foreach (msg, index, /set[]) G UNT do 
if {msg, index, /set[]) G UNTg then 

if index.b+ index, f > mpktJn[sendei].b + mpktJn[sender].f then 
//we found one of two things here: 

// either an untimely forwarded message packet from the parent that 
we process now as a message from the sender 
// or a post-critical, pre-donation forwarded message packet from the 
sender that we process now 

1 |ReceiveMessage{ msg, sender); 
end 

end 

if {msg, index, /set[]) G UNTp then 

if index.b+ index./ < condonation.mpktJn[seH].b + co-donation.mpktJn[self\.f then 
// ve found a timely message packet from us to the parent whose 
acknowledgement was untimely, so we process its clone now 

2 IReceiveAckl msg. sender!: 
end 

end 

end 

let g/70st[sender] = cOndonaUon.ghosUheight; 
let f/us/7[sender] = cOndonation. flush_height; 

3 |TryToInstall{ ); 
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Procedure CheckFlush _ 

if FwdWaitSet ^ 0 then 

return; // there are unstable forwarded messages, do not send any ghosts or 
flushes 
end 

if ghost_height < cur_view+ v_gap then 
let ghosFheight — cur_view+ v^gap: 

1 queue p^^^g^{ghost_height) to ContactSet; // multicast the ghost packets 
end 

if BcastWaitSet ^ 0 then 

return ; // there are unstable original messages, do not send any flushes 
end 

if flush_height < cur_view+ v^gap then 
let flush.height — cur_view+ v_gap; 

2 queue p^^^^^{ftush_height) to ContactSet; // multicast the flush packets 
end 
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Procedure TryToInstall 

// check whether all the members are fully flushed 

1 foreach pid G LiveSet do 

if flush[pid\ < cur_view+ v_gap then 

return ; // some members are not flushed - wait 
end 
end 

while v^gap>0do // this loop installs all the pending views 

2 // remove obsolete messages from ReceiveSet 
foreach msg G ReceiveSet do 

if VIEW (msg) = cur_ wen/ then 
remove msg from ReceiveSet; 
end 
end 

3 // remove obsolete messages from FwdQueue 
foreach pid G LiveSet do 

foreach msg G FwdQueue[pid\ do 
if viEw(msg) = cur.view then 

remove msg from FwdQueue[pid]; 
end 
end 
end 

increment cur_view\ 
decrement v.gap; 

pop notification from the head of PendViewQueue; 
if notification = (JOIN, pid) then 
add pid to MSet; 

ApplyJoin(pid); // deliver notification to APP 
if pid = self then 

execute Main(se/f); // launch the main APP thread asynchronously 
end 
end 

else if notification = (REMOVE, pid) then 
remove pid from MSet; 

Apply Removal) pid); // deliver notification to APP 
end 

4 reset vt; // New view now installed. vt coordinates reflect new membership 
IScanO : // high view messages may now become deliverable 

end 

// v^gap—0 - time to broadcast all pending messages 
while LaunchQueue [] do 

pop msg from the head of LaunchQueue; 

5 IprotBroadcastK msg); 
end 


55 









Procedure Scan 

// look for deliverable messages. A message is deliverable if all the 
following are true: 

//I. It is a current-view message 

// 2. It is the next expected message from its originator 
// 3. All the messages on which it depends have been delivered already 
let deliverable_messages-found — false; 
foreach msg G ReceiveSet do 
if viEw( msg) = cur_w'ei/i/then 

// msg is a current-view message 

if VT(msg)[ orig( msg)] = vt[oRiG(msg)] + 1 then 

// msg is the next expected message from its originator 
let alLdependents_delivered = true; 
foreach pid G MSet and pid ^ ORiG(msg) do 
if VT{msg)[pid\ > vt[pid\ then 

let alLdependentS-delivered = false; 
end 
end 

if alLdependentS-dellvered — true then 
// msg is deliverable 
let deliverable_messages_found = true; 
increment vt[oRiG(msg)]; 
remove msg from ReceiveSet; 
let originator = ORiG(msg); 

strip out metadata stamps viEw(msg), VT(msg) and ORiG(msg); 

1 ApplyMessage(msg, originator); // deliver message to APP 

end 
end 
end 
end 

if deliverable_messages-found = true then 

IScanl i: // try to see if more messages can now be delivered 
end 
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4 Basic Properties Of The CBCAST Algorithm 


In subsequent sections we will analyze the CBCAST protocol in depth. Right now we want to highlight 
some of its important basic properties. 


4.1 Some CBCAST invariants 

Definition 16. 

• Let T he a transaction. The trigger ofT is denoted trig{T) 

• Let e £ Ep. Then e belongs to a unique transaction T. We denote T = trans{e) and use 
trig{e) as shorthand for tng[trans{e)). 

• Let T be a transaction. The view of T is view{trig{T)) and denoted by view{T). Since the 
side effects of T cannot contain notification events, all the events in T share the same view 
view(T). 

Definition 17. Let P be any process in a group that executes the CBCAST protocol. Let var be any 
state variable (see \tl.4.1^ and let e £ Ep be any event other than the join event of P, in other words 
ef=Vjt^P){Pf^. 

If e is a trigger event we use the notation varp@f, to denote the value of the variable var at process 
P at the onset of the transaction trans[e). We use the notation var^®^ to denote the value of the 
variable var at process P at the conclusion of the transaction trans{e). 

If e is a queuing event we use the notation varp@e to denote the value of the variable var at process P 
at the moment when the queuing event occurs. Since queuing events do not change state variables, 
there is no distinction here between the pre and post values. 

When e = nj(p) the processing of e causes the execution of the \protStari\ procedure or the 

process. We use the same definition of var^®^ that we use for any other trigger. However for an 
original process we define 

varp@e = var^®^^ 

and for a late joining process we define varp@e to be the value of var right before the invocation of 
the I CheckFlush\ vrocedure at the end of the \protRun\ procedure. 

This last part of the definition is admittedly not elegant. However it does have some intuitive 
justification in the sense that the endpoint of \protStart\ and the vre ]CheckFlusM voint in the \protRun\ 
procedure are the first points in the life of a process where it is fully initialized as a CBCAST process. 
Definition 18. Let P be a process and let e £ Ep be any trigger event. Let Ue be the set of 
processes that had not yet contacted P at the time that e occurred. These are processes that joined 
the group before P did, but for which P has not yet processed a donation packet. Formally 

Ue = {Q S P II j{Q) < j{P) and if d = PdonateO S then either df^ y e or d^^ does not exist} 
In particular if P is a member of view zero then Ue = 0. 

The set Ue is called the uncontacted set ofe. 


protRun procedure, according as P is an original process (a member of view zero) or a late joining 
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Lemma 8 (CBCAST Omnibus Lemma). 

Let P he any process and let e be any trigger event in P. Then the following relations hold at P: 

1. {cur_view+ V-gap)^®^ = view{e) 

2 . 


LiveSet^®'^ = {Q G P \\j{Q) < {cur_view+ v.gap)^^‘^ < r{Q)} 
ContactSet^®^ = LiveSet^®^ \ Ue 


3. The entries in the vectors Fn/c/Queue)]^®®, and flush[]^'^^ correspond 

exactly to the members of LiveSet^®'^. 

4- The entries in the vector correspond exactly to the members of MSet^®^. 

5. For any X G LiveSet^®^^ 

flush[X]^®^ < ghostlX]^®^^ < {cur.view + v.gap)^®^ 

If X ^ ContactSet^®^ then the right inequality is strict. If v_gap^®‘^ = 0 then the inequalities 
are actually equalities. 

6. For any X G LiveSetp@e p| LiveSet^®^ 

ghost[X]p^^ < ghostlxf®"" 
flush[X]p@^ < flushiXf®^^ 

in other words the values of ghost[X] and flush[X] are non-decreasing. 

7. If e = Vi{PY^ and X G LiveSet^®^^ then 

< ghost_heightx@y i(X)PR 

flush[X]^®^'^^'‘ < flush_heightx@y^(^x)P^ 

8. flushlP]^®^^ < flush_height^®‘^ < ghospheight^®^^ < {cur_view + v_gap)^®^ 

9. If v_gap^®^ > 0 then 

ghosTheight^®^^ = (cur.view + v_gap)^®‘^ if and only if FwdWaitSet^®'^ = 0 
flush.height^®^^ = {cur.view + v^gap)^®^^ if and only if WaitSet^®^ = 0 


10. If v.gap^®’^ = 0 then LaunchQueue^®*^ = 0. 

Proof. The proof proceeds by induction on e, where we assume by induction that the lemma holds 
for / if either / ^ e or if view(/) < view(e). This is possible thanks to Corollary [3] and Lemma[21 
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We have three types of triggers to consider: message broadcast request events, notification events 
and packet dequeuing events. In the latter case we will use the following notation throughout, k is 
the packet that is being dequeued (so e = and X is the process that queued the packet k. 

We start by picking off the easy cases. We show that if e is a packet dequeuing event or a message 
broadcast request event then claims (HD, m and ([3]) all hold, claim ([7]) holds vacuously for e because 
it only pertains to notification events. 

For the remaining cases, since every trigger event causes some CBCAST procedure to be executed, 
we simply go over each procedure and show that if the inductive hypothesis is assumed then the 
lemma holds at the end of the execution of the procedure. 

To prove claim CD when e is a packet dequeuing event we need two facts. First, the proof of 
Corollary [3] demonstrates that view(e) = view(e'), where e' ^ e is the immediate predecessor of 

procedure does not change the sum cur_view+ v_gap (the |TryToInstall| utility procedure increments 
cur_view and decrements v_gap zero or more times, but does not change their sum). These facts 
taken together with the inductive hypothesis give 


e in Ep. Second, a lengthy but routine inspection of the pseudo-code shows that the protPacket 


view(e) = view(e^) = {cur.view + v_gap)^^^ = {cur_view + v_gap)p^^ = {cur.view + v_gap)^^^ 


The exact same argument holds when e is a message broadcast request event (with |protBroadcast| 
replacing protPacket). 


To prove the first part of claim m when e is a packet dequeuing event or a message broadcast 
request event, notice that neither |protPacket| nor |protBroadcast| change the value of LiveSet, and 
as we already saw these procedures do not change the value of cur_view+ v^gap either. As a result 
this part of the claim follows by induction. 

An immediate corollary is that if e is a packet dequeuing event then X € LiveSet^®®. This is because 
the definition of view(e) and the Conforming Packet Axiom imply that j{X) < view(e) < r{X). It 
follows from claim CD and the first part of claim ([ID that X £ LiveSef^®®. 


The second part of the claim is a bit more complicated. As long as e is not a donation packet it is 
easy to check that ContactSet does not change and Ue = Ug', where e' is the immediate predecessor 
of e in Ep and so this part of the claim follows by induction. 


If e is the donation packet from X then it is easy to see that A ^ Ug and that Ug = Ug' \{A}. We al¬ 
ready demonstrated that A £ LiveSef^®®. The |protPacket| procedure executes the lReceivePonationl 
procedure which in turn adds A to ContactSet and so by induction 


ContactSet^®® = ContactSetp@g U {A} = ContactSet^®® U {A} = 

= (LiveSet^®®' \ Ug/) U {A} = (LiveSet™® \ Ug/) U {A} = 

= (LiveSet™® U {A}) \ (Ug/ \{A}) = LiveSet™® \ Ug 


Proving claim ([HD when e is a packet dequeuing event or a message broadcast request event amounts 
to a routine check that only notification related procedures, namely jprotStartl [protRun} jprotJoin] 
and jprotRemov^ actually add or remove entries from the mentioned vectors or change LiveSet. 
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We prove the remaining claims for non-notification events by examining the |protBroadcast| proce¬ 
dure and each of the service procedures that are invoked by the protPacket procedure. A claim has 
to be tested against a procedure only if the procedure changes one or more of the variables that are 
mentioned in the claim. 


IprotBroadcast] 

This procedure only affects claims and cni). If v_gap = 0 it adds a record to WaitSet, but 
in that case claim © is vacuously true. If v_gap > 0 it adds a message to LaunchQueue, but 
in this case claim m is vacuously true. 

[ReceiveMessa^ 

This procedure does not affect any of the claims because it does not change any of the relevant 
variables (this includes the invocation of the IScanI procedure which also does not change any 
of the relevant variables). 

IReceiveAckI 

This procedure affects claims ([5]) and (0 by making changes to WaitSet, ghost_height and 
flush.height but no other variable. 

Claim ([5]) is true by induction before ICheckFlushl is called. The claim remains true if 
ICheckFlushl sets ghost_height — cur_view -f v_gap. There are two possible impediments to 
this action. If Fwd WaitSet ^ 0 then ICheckFlushl does nothing and the claim remains true by 
induction. If ghost_height is already high before [CheckFlu^ is called the claim remains true 
regardless of whether ICheckFlushl raises flush_height or not. So ([5]) remains true in all cases. 

Claim ([n]) is vacuously true if v.gap = 0 so assume that v_gap > 0. The procedure may 
shrink, but does not enlarge, either FwdWaitSet or BcastWaitSet and if it removes any record, 
it invokes the ICheckFlushl procedure. If WaitSet does not lose a record then ICheckFlushl is 
not called and nothing changes. If WaitSet loses a record, we have to look at the following 
cases: 


FwdWaitSet remains non-empty after the record loss 

In this case ICheckFlushl does nothing and the claim remains true by induction. 

FwdWaitSet becomes empty while BcastWaitSet remains non-empty 

In this case it follows from the inductive hypothesis that 

ghost.heightp@^ < cur_view+ v_gap 
flush_heightp@^ < cur_view+ v_gap 

and therefore ICheckFlushl sets ghost_height^'^'^ = cur.view + v_gap and does not touch 
flush.height. These changes preserve the claims of (jH]). 

FwdWaitSet becomes empty while BcastWaitSet remains empty 

In this case it follows from the inductive hypothesis that 

ghost.heightp^^ < cur_view+ v_gap 
flush_heightp@^ < cur_view+ v_gap 

and therefore ICheckFlushl sets 

ghost_height^®'^ = flush_height^®'^ = cur_view+ v^gap 
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These changes preserve the claims of (jH])- 

FwdWaitSet remains empty while BcastWaitSet remains non-empty 

In this case it follows from the inductive hypothesis that 

ghost.heightp^^ = cur_view+ v_gap 
flush_heightp@^ < cur_view+ v_gap 

and therefore ICheckFlushl does nothing and the claim remains true by induction. 

FwdWaitSet remains empty while BcastWaitSet becomes empty 

In this case it follows from the inductive hypothesis that 

ghost_heightpQ^ = cur_view+ v_gap 
flush_heightp@^ < cur_view+ v_gap 

and therefore ICheckFlushl sets flush^height^®^^ = cur_view + v_gap and does not touch 
ghost.height. These changes preserve the claims of 

IReceiveGhostI 

This procedure affects claims ©and but only with respect to the sender process X. We 
know that LiveSetp@e = LiveSet^®® and that X G LiveSet^®® so both claims must be verified 
for X. 

Also note that it follows from claim m that for every process in LiveSet'^®® all the helds in 
claims © and m are well defined. 

Let k = Pghost(«)- 

To prove claim ([HI) we first have to note that the value of ghost_height (and flush_height) 
is non-decreasing, as is easy to verify by looking at the pseudo-code and specifically at the 
ICheckFlushl procedure. 

Assume first that k is not the first packet from X that carries ghost information (in addition 
to ghost packets, donation packets and co-donation packets also carry ghost information). 
It follows from the Packet Order Axiom and from the monotonicity of ghost_height that 
V > ghost[X]p^^ and we are done. 

If k is the first such packet then ghost[X]p^^ is the initial value assigned by the |protStart[ 
|protRun| or the |protJoin| procedure, where each of the cases occurs when j{X) = j{P) = 0; 
j{X) < j{P) ^ 0; and j{X) > j{P), respectively. 

In case j{P) = 0 we have ghost[X]p^^ = 0 < r; and we are done. 

In case j{X) = j{P) 0 we have X = P and one can verify by looking at the [protRun] 
procedure that 

ghost[X]p^^ = ghost[P] = ghost[Pf®"’^‘-^'>^^'' = 

= ghosPheight^®'^^^^^^^'^ < ghost_heightp@j.Qv = v 


where the last inequality follows from the monotonicity of ghospheight. 
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Look at the case j(X) < j(P). By induction (on claim ([7])) 

ghost[X]p^^ = < g/70st./)e/g/)t^@„.^^j(^)PR 

and by the monotonicity of ghost_height 

ghost.heightx@y^^p^^xrK < ghost.heightx@kQ^ = v 

and we are done. 

Now look at the case j{X) > j{P). In this case X is a late joining process. Let E be the 
parent of X. Then 

ghost[X]p^^ = ghost[Ef®'"^^^'>< ghost_heightp @^.^^^(£;)pr 
and since X inherits its values of ghosEheight from the value of ghosEheight in E we have 
ghosEheightp^^.^^^^E^PK = ghosEheightx@y.^^^(^x)P^ 

And by monotonicity we have 

ghosEheightx@y.^^^(^x)Pp < ghosEheightx&kQv = v 
and we are done in this case as well. 

To prove claim ([S]) let e' be the trigger of the A-transaction that queued the packet k. Then 
e' ^ -< k^^ = e. It follows from Lemma [2] that view(e') < view(e). 

By claim (|n|) that we just proved and by induction we know that 


flush{xf®^ 


flush[X]p^^ < ghost[X] 
< {cur.view + V-gap) 


P@e 

X@e' 


< ghost[X\^®^ = V < ghosEheight^®^ < 

= view(e^) < view(e) = {cur.view + v_gap)^®^ 


We have to prove the additional assertions in claim (O in the cases where v_gap^®^ = 0 and 
X ^ ContactSet^®'^. 

In the case v_gap^®^ = 0 we have by induction 

f/us/7[A]'^®® = flush[X\pi^^ = {cur_view+ v.gap)p^^ = {cur.view+ v^gap)^®"" 


The case X ^ ContactSet^®® cannot occur here. To show that, we only have to prove that 
A ^ Ue. The rest follows from the fact that A S LiveSet^®® and from claim 


If j(A) > j{P) then A ^ Ue by definition. If j(A) < j{P) then the first packet that A sends 
to P is a donation packet which must precede k. Therefore by definition A ^ Ue and so it 
must be in ContactSet at this point. This takes care of claim ([5]). 


IReceiveFlushl 

This procedure updates flush[X] and then calls the |TryToInstall| service procedure. We will 
start by ignoring TryToInstall| and show that the inductive hypothesis still holds before 
|TryToInstall| is invoked. Later we show that |TryToInstalI| preserves all the claims. 


62 













This procedure affects claims ©, (ED and (ED, but for the first two claims only with respect 
to the sender process X. We know that LiveSetp@e = LiveSet^®® and that X € LiveSet^®'^ 
so both claims must be verified for X. 

Also note that it follows from claim (ED that for every process in LiveSet^®® all the fields in 
claims ED and (ED are well defined. 

Let k Pplush(^)' 

We start with claim (ED- To prove it, we first show that flush[X]^®^ < ghost[X]^®^. This 
requires a bit of digging. The packet k is queued by X through the execution of the lCheckFlushl 
procedure. This procedure may or may not queue a ghost packet of the same height, to the 
same target set, immediately prior to queuing the flush packet. If a ghost packet is queued 
then it follows from the Packet Order Axiom that P processes the ghost packet immediately 
prior to the current flush packet, resulting in an equality flush[X] = ghost[X] The 
only difficulty arises if a ghost packet is not queued. 

The ICheckFlushl procedure is invoked by X as part of the execution of a notification transac¬ 
tion or an acknowledgement packet processing transaction. An inspection of the pseudo-code 
easily shows that in the notification case a queuing of a flush packet is always preceded by the 
queuing of a ghost packet because all three procedures - |protRemove[ IprotJoin] and [protRun] 
- increment v_gap which results, according to claim (ED, in ghosPheight being low. 

Let e' be the trigger of the AT-transaction that queued the packet k. We can assume that e' is 
an acknowledgement packet processing event. Since e' results in the queuing of a flush packet 
of height V destined to P without the queuing of a ghost packet of the same height we know 
by pseudo-code inspection that 

flush.heightx@e' < {cur.view+ v_gap)x@f./ = v and therefore u > 0 
ghost.heightx@e' = {cur.view+ v_gap)x@^, = v 
P G ContactSetjf@e' 

We know from claim m that 

view(e') = {cur.view+ v_gap)^'^^ = {cur_view-\- v.gap)x@e' — 


Let / = be the most recent view change notification preceding e'. If the notification 

Vy(X) is a removal or joining of some process Q ^ X then / is not the first event in Ex and 
we know by induction from claim (ED that 

ghost_heightx@f ^ {cur.view+ v_gap)x@f = v — 1 < view(e') 


If / = Xrun then j{X) = V > 0 and X has a parent E. The protRun procedure does not 
change the value of ghosEheight until ICheckFlushl is called and therefore by induction 

ghost_heightx@f = ghost_heightE@y^(EpR < {cur.viewP v_gap)E@y^^^EpR = u - 1 < view(e') 

Taking care to interpret varx@e' correctly for |protRun| (see Definition fTTp . 
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Therefore there is some trigger event f ^ f ^ e' at X such that 

ghost_heightx@ft < ghost_height^®-^ = v 

Code inspection shows that the transaction trans(/') must invoke the ICheckFlushl procedure 
and must result in the queuing of a ghost packet of height v. If P is in the target set of 
this multicast then we are done. If it is not, then P must join ContactSet sometime between 
trans(/') and trans(e'). Code inspection shows that P can join ContactSet either after a join 
notification or as a result of sending a donation to X. 

By the dehnition of / there cannot be any notification events between /' and e' and therefore 
process X must receive a donation packet d from P at some point between these two trans¬ 
actions. This in turn causes X to queue a co-donation packet to P that includes its current 
ghost^height value. 

Since /' -< -< e' and since we know by direct code inspection that ghospheight is non¬ 

decreasing, it follows that the ghost height that X co-donates is equal to v. From the Packet 
Order Axiom it follows that the co-donation packet is processed by P before k is processed. 
As a result 

ghost[X]p^^ = V 

And we are done showing that flush[X]^'^‘^ < ghost[X]^'^‘^. By induction we can conclude 
(using claim ([5])) that 

ghost[X]^®‘' = ghost[X]p@^ < {cur.view+ v.gap)p^^ = {cur_view+ v_gap)^®'' 

Which concludes the proof of the first part of claim ([5|) . 

We have to prove the additional assertions in claim ([5]) in the cases where v^gap^®^ = 0 and 
X ^ ContactSet^®®. The case X ^ ContactSet^®® cannot occur here for the same reason as 
in the case of lReceiveGhostl 

If v_gap = 0 then we can assume by induction that 

flush[X]p^^ = {cur_view+ v_gap)p^^ 

We now use claim ([B]) (which we will prove shortly) to conclude that 

{cur_view+ v_gap)^®'' = {cur.view+ v.gap)p^^ = flush[X]p^^ < 

< flushlX]^^^^ < {cur.view + v_gap)^‘^^ 

And we are done. 

The proof that the IReceiveFlushl procedure preserves ([B]) is almost identical to the same proof 
for the IBeceiveGhostl procedure. The only difference is that in the case where k is the first 
packet that carries flush information and where j{X) > j{P), the process X inherits its value 
of flush_height from the value of ghost.height at the parent E, and not the value of flush.height 
of the parent. Similarly process P initializes flush[X] from ghost[E] and not from flush[E]. 
Therefore we get 

flush[X]p^^ < ghost_heightE@^.^^^(^p)PR 
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and since X inherits its values of flush_height from the value of ghost.height in E we have 
ghost.heightE@^.^^^^E)PR = flush.heightx&y.^^^^xE^ 

And by monotonicity we have 

flush_heightx@y.^^^(^x)p^ < flush_heightx@kQ^ =v 

and we are done. 

Proof that the procedure preserves ([5]) is only required when X = P, i.e. when fc is a self 
packet. In that case we can use the monotonicity of flush.height to conclude 

flush[P]^'^‘^ = V = flush_heightp^i.Qu < flush.height^'^^ = flush^height^®^^ 

Finally the procedure invokes the |TryToInstalI| procedure which preserves all the claims as we 
show further on. 

|protStart| 

It is trivial to check that the procedure initializes P to a state that conforms to the require¬ 
ments of the lemma. 

|protRemove| 

Let R be the process that is being removed. Then by definition r{R) = view(e). 

To see that this procedure preserves © notice that its event e is a notification event, meaning 
that if e' is the preceding event in Ep then view(e) = view(e') -I- 1 while the procedure 
increments v_gap, maintaining the equality. 

It preserves the first part of claim ([21) because it removes R from LiveSet. R is the only 
process in LiveSet that no longer meets the condition j{Q) < cur.view + v_gap < r(Q) as 
v_gap is incremented. No other process is affected because no other process satisfies either 
r(Q) = r(R) or j(Q) = r(R). 

It preserves the second part of claim ([2]) because it removes R from LiveSet and ContactSet, 
while not affecting Ue since e is not a donation packet dequeuing event. 

It preserves (I3|) because it removes the R coordinate from all the required vectors. It does 
not affect ®. 

It preserves ® because it shrinks LiveSet, makes V-gsp > 0 and increments the right hand 
side of the inequality while not affecting the left hand side. 

To see why it preserves claims (|5|) and ®, notice that by incrementing v.gap it forces 
ghosEheight and flush_height to be low without touching them. As a result the call to 
ICheckFlushl at the end has the following effects: 

• ghosEheight becomes high if and only if FwdWaitSet = 0. 

• flush.height becomes high if and only if WaitSet = 0. 
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Therefore flush.height becomes high only if ghost.height becomes high, and therefore by in¬ 
duction flush_height < ghost.height in all cases. 

The procedure increments v_gap, making (IIUI) vacuously true. 

It preserves (O because it does not change the values of ghost[X] and flush[X] for X ^ R. 

Finally we have to show that the procedure preserves claim ©• Let i = view(e) and let 
X ^ Rhe any process that remains live after the view change. If P does not dequeue any 
ghost (flush), donation or co-donation packet from X between and e = Vi{P)^^ 

then we get by induction, for either ghost or flush 

ghostlX]^®'"'^^^ = ghost[X]^®'’'-^^^^ < ghost_heightx@y._ 

flush[X]^®'’'^^'^ = < flush_heightx@y^_^(^x)^^ < flush_heightx@vi{x)P^ 

and we are done. Otherwise, let e' = be the last dequeuing event of a ghost (flush), 
donation or co-donation packet from X a.t P with view(e') = i — 1. Let v be the ghost (flush) 
height carried by the packet. Then we have in each case respectively 

ghost[X]^®^'^^'^ — ghost[X]^®^ = V = ghost_heightx@kQ'^ 

flush[X]^®'^'^^^ = flushlX]^®^^ = V = flush_heightx@kQvs 

and from the Piggyback Axiom it follows that -< z)i(Ar)’°'^ and so in each case respectively 

ghost.heightx@kQv < ghost.heightx@y.(^xp^ 
flush_heightx@kQv < flush.heightx@y.(x)PR 


and we are done. 

|protJom| 

Let J be the joining process and let E be its parent. Then by definition j(J) = view(e). 

This procedure preserves O, ®, ® and cni) for the exact same reasons as |protRemove[ 

it satisfies the first part of claim m because it adds J to LiveSet. J is the only process that 
newly meets the condition j{Q) < cur_view+ v^gap < r{Q) as v_gap is incremented. No other 
process is affected, because no other process satisfies either j{Q) = j{J) or r{Q) = j{J). 

It preserves the second part of claim ([2]) because it adds J to LiveSet and ContactSet, while 
not affecting Ue since e is not a donation packet dequeuing event. 

It preserves ([3]) because it adds a J coordinate to all the required vectors. It does not affect 

0 . 

It preserves ([S]) because it makes v_gap > 0 and increments the right hand side of each existing 
inequality while not affecting the left hand side. The new inequalities for the J coordinate 
are inherited from the original ghost inequality for its parent E. 

It preserves (O because it does not change the values of ghost[X] and flush[X] for X ^ J. 
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Finally we have to show that the procedure preserves claim ©• Let i = view(e) and let X be 
any process that is live after the view change, li X ^ J then the claim holds following the 
exact same argument that we used in the |protRemove| case. In the case X = J we have 


and since we have already proven the case X ^ J we know that 

< ghost.heightE@^.(^E^PR 

Since J inherits its values ghost_height and flush_height from its parent value of ghosEheight 
we have 


ghost_heightE@^.f^E-)PP. = ghosEheightj^^.f^j^Pp. = flush.heightj@^.(^j^PR 
and we are done. 

Notice that in the last equation by definition the value flush_heightj@^.(j)PR reflects the fact 
that the |protRun| procedure copies the initial value of ghost_height into flush_height (see Def¬ 
inition El). 

IprotRim] 

Let J be the new process and let E be its parent. Then by definition j{J) = view(e). Let 
ce = Vj(j){E)^^ and let e' ^ e^; be the event immediately preceding be in E. It follows from 
the Parent Axiom that be exists and therefore e' exists as well since be is not the first event 
in E^;. 

Process J starts life with the exact same state that its parent had when it dequeued the 
njoiisiiJ, E) notification, which is the same state it had at the conclusion of the trans(e'). By 
induction, E satisfied all the claims at that point. For most claims this means that they 
are automatically satisfied at that point in J as well, as long as we interpret the expressions 
to simply mean the initial value of var, ignoring the fact that the value of b! there is 
undefined in J. However there are a number of exceptions. 

claim m becomes ill-defined because e' is not defined in J. The second part of claim ([2]) 
is not satisfied because it depends on the value of Ug' which is ill-defined at J, and any 
rate is not inherited from E. Claim ([0]) becomes ill-defined because LiveSetj@e/ has no clear 
interpretation in J. Similarly, claim © depends on the meaning of e' and so is hard to 
interpret in J. Claim ([H|) references the self value flush[P] where P is the local process. 
Whenever we need rely on any of these ill-defined inductive claims as we go, we will present 
an explicit calculation that relies directly on the well-defined E version of these claims. 

The |protRun| procedure preserves ((T|) because 

view(e) = j{J) = view(e') -I- 1 = {cur_view + v_gap)^®^ -|- 1 = 

= {cur.view + v^gapY®^ -I- 1 = {cur_view + v.gapY®^ 

Where the last equality follows because the |protRun| procedure increments v.gap. 
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claims (ED, (gnD and most of ([HD are preserved by jprotRun] for the exact same reasons as in 
the protRemovej case. The only missing piece is that the inequality 

flush[J]'^'^‘^ < flush.heighf^®^ 


follows because the self flush height flush[J] is initialized to be equal to ghost^height'^®^^ . The 
same value is used to initialize flush_height, and thereafter flush_height can only increase. 


jprotRun] satisfies the first part of claim ([2D for the same reason that [protJoin| does. 

[protRunj satisfies the second part of claim ([HD because at the outset Ue contains every process 
X whose join view is lower than j{J). This includes every member of LiveSet except J itself. 
Since the protRun procedure adds J to LiveSet while setting ContactSet = {J}, this makes 
the equations true. 


It preserves ([HD because it adds a J coordinate to all the required vectors. It does not affect 

(SD 

It preserves ([HD because it makes v_gap > 0 and increments the right hand side of each existing 
inequality while not affecting the left hand side. The new inequalities for the J coordinate 
hold because the values of flush[J] and ghost[J] are both initialized to ghost.height"'®'^ < 
ghost.height'®’^ and because we have shown that claim ([HD is preserved. 

It preserves ([HD because the |protRun| procedure does not change the initial values of the ghost[] 
and flushW vectors. 

Finally we have to show that the procedure preserves claim (0. Let i = view(e) and let X 
be any process that is live after the view change. If X 7 ^ J then the value of ghost[X] and 
flush[X] is inherited from E without change 


and since we have already proved all the claims for E (the [protJoin] case) we know that 

/ eT\PR 

g-/ 70 st[X]^g,„.(^)PR < g/)ost[X] ^ < g/)ost./)e/g/)t^@„.(^)PR 

f/us/7[X]^Q^.(^^PR < flush[X]^®'"'^^^ < flush.heighi^®'’'^^^ 

and we are done. The case X = J follows because the [protRunj procedure sets 

= ghost_heightjQ^.f^j-jPR = flush^heightj^^.f^j^n 


IReceiveDonationI 

Let S be the sender of the donation packet. Since e is a donation packet dequeuing event in 
this case, S falls out of Ug. This is compensated for by adding S to ContactSet. This preserves 
claim ©• Then the procedure makes a sequence of invocations of the |ReceiveMessage| and 
IReceiveAckI procedures which preserve all the claims as we have already shown. 
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Finally the procedure updates ghost[S] and flush[S] from the donated values of ghost.height 
and flush_height respectively. Remember that the donation packet is sent by the protJoin 
procedure as part of the j{P) view change notification processing. Therefore 


gh05t[Sf®'' = g/70St./7e/g-/7t5@„^j^^(s)PR 
flushisf®^' = ffus/7./)e/g/)ts@„.jpj(s)PR 
It follows from claim [8] of the inductive hypothesis that 


flush_heightg@^.^^^f^g)PR < ghost_heightg@^.^^^(^g^pn < {cur_view+ v.gap)g@^.^^^^g^pp = 

= j{P) — 1 < view(e) = {cur.view + v_gap)^'^^ 

This proves most of claimjSJ Since the inequalities are strict we have to show that v_gap > 0. It 
follows from claim ([0]) which we will prove shorty that flush[S] is non-decreasing, and therefore 
the inequality was strict at the start of the transaction. Therefore by induction v_gap > 0. 

Claim ® follows by the exact same reasoning that we used for the IReceiveGhostI and 
IRcceiveFlushl procedures, while claim ([7]) is not relevant at a non-notification transaction. 

IReceiveCoDonationI 

Let S be the sender of the co-donation packet, and let e' be the trigger of the donation 
transaction at which S queues the co-donation packet. 

The IR eceiveCoDonationI procedure makes a sequence of calls that preserve all the claims and 
then updates ghost[S] and flush[S] from the co-donated values. Then 


ghost[S]^®‘' = ghost.heightg@^, 
flushlS]^®^^ = flush_heightg@f,, 

From claim |S] of the inductive hypothesis and the Piggyback Axiom we know that 


flush_heightg@^, < ghost.heightg@^, < {cur_view+ v.gap)g^^, < 

< view(e') < view(e) = {cur.view + v_gap)^®^ 

which proves most of claim ([5]). Process S must be contacted at this point, so we do not have 
to show that the inequalities are strict. But if one of the inequalities happens to be strict, 
then we have to show that v_gap > 0. This follows from the same argument as in the case of 
IRcceiveDonationl 

Claim ([S]) follows by the exact same reasoning that we used for the IReceiveGhostI and 
IRcceiveFlushl procedures, while claim ([7]) is not relevant at a non-notification transaction. 

Finally the procedure calls |TryToInstalI| which also preserves all the claims as we will see. 
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|TryToInstall| 

This is a service procedure that is called by other procedures. However it preserves © and 
(O because it always increments cur_view and decrements v_gap together. Also, the procedure 
makes no changes to v_gap unless all the flush[] values are high. It preserves (HI because it 
resets vt whenever it changes MSet. 

It preserves (jS]) because it does not touch either ghost.height, flush_height or the self flush 
height flush[P] while keeping cur.view + v^gap hxed (this includes possible invocations of the 
|protBroadcast| and IScanI procedures). 

It preserves (0 because it either results in v_gap = 0 which makes ([9]) vacuously true, or else 
it fails to install views in which case it does not change any variables and therefore preserves 
(HI by induction. 

It preserves (fTHll because it either results in v^gap > 0 which makes (I1U| vacuously true, or 
else it starts out with v_gap = 0 in which case (fTUl) is true by induction, or else it starts out 
with v_gap > 0 and ends with v_gap = 0 in which case it empties out LaunchQueue, making 
(fTUl) true. 

|TryToInstall| does not affect claims (H| and (HI. 

IScanI 

This is a service procedure that is called by other procedures. It does not affect any of the 
relevant variables so it has no effect on any of the claims. 

□ 


Corollary 5. Let P be a process and let e G Ep he any trigger event. Suppose that 

LiveSetp@e ContactSetp@e 

Then there is a process X S LiveSet such that flush[X] < j{P). 

Proof. It follows from Lemma [gflUD that there is a live process X G LiveSet D Ue, and it follows 
from the definition of Ue that j{X) < j{P). Therefore P is not an original process and it has a 
parent E. P starts life with a state identical to the state of E at Vj(^p'){E)^^. 

Process X can only be in LiveSet if it is there originally or if P receives a join notification for X. 
Since the latter does not happen in this case we must have X G LiveSetp@j,^jj,j(p)PR and by Lemma 
151(51) flus/7[X]^Q^ exists. By Lemma IHIIUI 171151 and ITl) 

< flush.heightx@y.^p^(^x)P^ < icur_view+ v.gap)x@^.^^^^xr^ < ji.P) 
Therefore the initial value of flush[X] at P is smaller than j{P). 

Scanning the pseudo-code reveals that this initial value of flush[X] does not change in P unless P 
receives a notification of the removal of X (which does not happen in this case) or if P processes 
a co-donation packet from X (which also does not happen, because co-donations are always sent 
from a late joining process to an existing group member), or if P processes a flush or a donation 
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packet from X. The donation packet to P is the first packet that X queues to XP and therefore 
the first packet from X that P processes. Since e occurs before the donation packet is processed, 
the initial value of flush[X] is still extant and we are done. □ 

Lemma 9. Let P and Q be two original processes. Suppose that P sends a message packet 
p^g^{msg) to Q. When P queues the packet, it increments its mpkt_out and places msg in its 
WaitSet together with an index value equal to its updated value of mpkt-out (see the \protBroadca^ 
procedure for original message broadcasts and the \protRemove\ procedure for forwarded broadcasts). 
If Q processes the message it increments the value of mpktJn[P]. Then: 

1. Before Q processes the message, its value of mpktJn[P] is lower than index. 

2. After Q processes the message, its value of mpktJn[P] becomes equal to index. 

Moreover, if P — Q then the conclusion holds without the requirement that P be original. 

Proof. Both P and Q are original. Therefore Q is a member of ContactSet(P) from the start (see 
the IprotStartI procedure). Therefore every message that was queued by P prior to msg had Q in 
its recipient list and since channels are FIFO, all of these messages are processed by Q before msg 
is processed. The mpkPout variable in P is incremented every time a message packet is multicast 
by P (at the .b or ./ component, according as the message is original or forwarded), and similarly 
mpktJn[P] is incremented by Q every time a message from P is processed (at the .b or ./ component, 
according as the message is original or forwarded). Initially both variables are equal to zero (at both 
components) at both P and Q (see the |protStart| procedure). Therefore they remain at lockstep as 
claimed. 

Almost the same argument works when P = Q and P is not an original process. We just have to 
verify two things. One, that P G ContactSet(P) from the start, as one can verify by looking at the 
IprotRun] procedure. Two, that initially both mpkt.out and mpktJn[P] are equal at P. This can 
also be verified by looking at the |protRun| procedure. □ 

Lemma 10. Let P and Q be two original processes. Suppose that P sends a flush packet k = 
Pplush{^) Q- When P queues the packet, it increments its flush.height (see the I CheckFlushl 
procedure). If Q processes the flush it increments the value of flush[P]. Then: 

/7us/7[P]q@^,pr < ftush.heightp^^Qv = flush[P]^®^ = v 

Moreover, if P = Q then the conclusion holds without the requirement that P be original. 

The exact same claim is true if flush is replaced by ghost throughout. 

Proof. We prove the lemma for the flush case. The ghost case is identical using the appropriate 
substitutions. Both P and Q are original. Therefore Q is a member of ContactSet(P) from the 
start (see the [protStart] procedure). Therefore every flush that is queued by P prior to Pplush (^) 
had Q in its recipient list and since channels are FIFO, all of these flushes are processed by Q 
before Pplush('^) is processed. The flush_height variable in P is increased to be equal to w every 
time a flush of height w is broadcast by P (see the ICheckFlushl procedure), and therefore flush[P] 
is increased by Q every time a flush from P is processed (see the IReceiveFlushl procedure!. Initially 
both variables are equal to zero at both P and Q (see the IprotStarT] procedure). Therefore they 
remain at lockstep as claimed. 
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Almost the same argument works when P = Q and P is not an original process. We just have to 
verify two things. One, that P G ContactSet(P) from the start, as one can verify by looking at the 
IprotRun] procedure. Two, that initially both flush.height and flush[P] are equal at P. This can also 
be verified by looking at the |protRun| procedure. □ 

4.2 Side Effects of CBCAST Triggers 

As we have seen, each trigger event - a notification event, a packet processing event, or a message 
broadcast request event - causes a CBCAST callback to be invoked. Each invocation can cause zero 
or more packets to be queued on various channels - in other words the invocation causes side effects 
(see We are now going to characterize the side effects of each type of trigger in detail. 


4.2.1 Side effects of message broadcast request events 

Message broadcast requests are processed through the |protBroadcast| procedure. The following 
lemma details the possible side effects of this procedure call. 

Lemma 11. An invocation of the \protBroadcas^ procedure results in exactly one of the following 
outcomes: 

• No additional packet queuing, if v_gap > 0. 

• A message packet multicast if v^gap = 0. 

Proof. Obvious from the code of IprotBroadcast} □ 


4.2.2 Side effects of view change notifications 


A view change notification from GMS is processed either through the protJoin or the protRemove 
procedures, depending on the type of view change. In addition, a joining process starts out life 
with an exact replica of the state of its parent, and immediately invokes the |protRun| procedure. 
The following lemma details the possible effects of these calls. 

Lemma 12. 1. An invocation of the \protJoin\ procedure results in the queuing of a donation 

packet, followed by exactly one of the following outcomes: 


• No additional packet queuing, if FwdWaitSet is not empty. 

• A ghost packet multicast if FwdWaitSet is empty and BcastWaitSet is not empty. 

• A ghost packet multicast followed by a flush packet multicast, if both BcastWaitSet and 
FwdWaitSet are empty. 

2. An invocation of the \protRemove\ procedure when process P is removed, results in exactly one 
of the following outcomes: 

• A sequence of message broadcasts, one per message in FwdQueue[P], if FwdQueue[P] is 
not empty. 
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• No additional packet queuing if FwdQueue[P] is empty, and FwdWaitSet is non-empty 
when ^UheckFlusl^ is called. 

• A ghost packet multicast if FwdWaitSet is empty and BcastWaitSet is not empty when 
I CheckFlushl is called. 

• A ghost packet multicast followed by a flush packet multicast if both BcastWaitSet and 
FwdWaitSet are empty when \UheckFlusEi is called. 

3. An invocation of the \protRun\ procedure results in exactly one of the following outcomes: 

• No additional packet queuing if FwdWaitSet is non-empty. 

• A ghost packet multicast followed by a flush packet multicast if FwdWaitSet is empty. 

Proof. All these outcomes are easy to verify by tracing the code path in the respective calls. In the 
case of IprotRemovel it is important to notice that if FwdQueue[P] is not empty, then the forwarded 
messages are placed in FwdWaitSet, which as a result is not empty when ICheckFlushl is called. □ 


4.2.3 Side effects of message and ackno-wledgement packet receipts 

A message receipt always results in the sending of an acknowledgement packet in response. An 
acknowledgement packet receipt causes a stabilization of a message with respect to the process that 
sent the packet. If this stabilization causes either FwdWaitSet or BcastWaitSet to empty out, it can 
cause the multicasting of ghost and flush packets. 

Lemma 13. 1. An invocation of the \ReceiveMessage\ procedure results in the queuing of an ac¬ 

knowledgement packet targeted at the sender of the message. 

2. An invocation of the \ReceiveAck\ vroceAure results in exactly one of the following outcomes: 

• No additional packet queuing if v^gap = 0. 

• No additional packet queuing if the acknowledgement does not eause either FwdWaitSet 
or BcastWaitSet to empty out, even if one or both were already empty. 

• No additional packet queuing if the acknowledgement causes BcastWaitSet to empty out 
and FwdWaitSet is non-empty. 

• A ghost packet multicast if v.gap > 0 and the acknowledgement causes FwdWaitSet to 
empty out and BcastWaitSet is non-empty. 

• A flush packet multicast if v^gap > 0 and the acknowledgement causes BcastWaitSet to 
empty out and FwdWaitSet is empty. 

• A ghost packet multicast followed by a flush paeket multieast, if v_gap > 0 and the 
aeknowledgement causes FwdWaitSet to empty out and BcastWaitSet is empty. 

Proof. This follows directly from direct observation and from Lemma [sp and El). □ 
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4.2.4 Side effects of ghost and flush packet receipts 

A ghost packet receipt does not cause any other packets to be sent. A flush packet, however, may 
cause one or more views to be installed. If that happens then one or more original messages from 
LaunchQueue may be broadcast. 

Lemma 14. 1. An invocation of the \ReceiveGhost\ vrocedure does not result in additional packet 

queuing. 

2. An invocation of the \ReceiveFlush\ vrocedure results in exactly one of the following outcomes: 

• No additional packet queuing if no views are installed or if LaunchQueue is empty. 

• One or more message packet multicasts if one or more pending views are installed and 
LaunchQueue is not empty. 

Proof. This follows from Lemma IHljlO|) and from direct inspection of the code of the relevant pro¬ 
cedures. □ 


4.2.5 Side effects of donation and co-donation packet receipts 


The side effects of invoking the IReceiveDonationI procedure are pretty straightforward - first a co¬ 
donation packet is sent and then a sequence of |ReceiveMessage| and IReceiveAckI invocations are 
performed, each with its side effects that have already been characterized. The side effects of 
invoking the IReceiveCoDonationI procedure are a bit more subtle, because this procedure has an 
additional call to |TryToInstall| at the end and there is an interplay between its side effects and the 
side effects of the rest of the procedure. 

Lemma 15. 1. An invoeation of the W.eceiveDonationl vrocedure results in the queuing of a co¬ 

donation packet back to the sender of the donation packet, followed by a sequence of side 
effeets for each of the \ReceiveMessage\ and \ReceiveAeM invocations. 

2. An invocation of the \ReceiveCoDonation\ procedure results in exactly one of the following 
outcomes: 


Zero or more invocations of \ReceiveMessa^ or \R.eceiveAcld occur with their side effects, 
and I Try ToInstail\ fails to install a new view and has no side effects. 

No invocations of either ReceiveMessage or \ Receive AcM occur, and TryToInstall succeeds 
in installing all the views. As a result zero or more original messages from LaunchQueue 
are broadcast. 


Proof. The donation case is self evident. In the co-donation case, if v_gap = 0 (we will see later 
that this case does not actually happen) then |TryToInstalI does not install any views and it follows 
from Lemma IRlfin)) that LaunchQueue is empty and so TryToInstall| has no side effects. 


Therefore the only non-trivial case has to do with a co-donation that starts with v_gap > 0 and 
results in a successful new view installation. Suppose that the sender of the co-donation is G and 
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the receiver is P. Then a successful installation requires, at P\ 

flush[G] = cur_view+ v^gap 
flush[P] = cur_view + v.gap 

From Lemma 15115]) we know that at P 

flush[P] < flush_height < cur_view+ v^gap 

Therefore flush[P] = flush_height = cur.view + v^gap and since v_gap > 0 it follows from the same 
lemma (part[9|) that WaitSet is empty and as a result UNTp is empty. 

Looking at the lReceiveCoDonationI procedure one sees that P updates its value of flush[G] just before 
invoking |TryToInstall[ setting it to be equal to the co-donated value of flush_height in G. Therefore, 
at the moment that G sends the co-donation, it has flush_height{G) = cur.view{P) -\- v_gap{P). In 
other words, if we denote by d the donation packet that P sends to G and by c the co-donation 
packet that G sends to P then 

flush.heightc@^PR = {cur_view+ v_gap)p^^pR 

The Piggyback Axiom and Lemma [51[T]) imply that the co-donation packet cannot be processed 
when P has an overall view height that is lower than that of G and therefore 

{cur.view+ v_gap)p^^pR > {cur_view+ v.gap)Q^^pR 


therefore 


flush_heightQ@^PR > {cur.view+ v.gap) q^^pr 
From Lemma I51I5|) it follows that 

flush_heightQ@^PR < {cur.view^ v.gap) q^^pr 
and so we can conclude that 


flush_heightQ@^pR = {cur.view+ v.gap) q^^pr 

From Lemma [SHU and [T|) it follows that P ^ ContactSetQQ^jPR. From Corollary [5] it follows that 
flush[P]Q^^PR < {cur.view+ v.gap)Q^jPR. From Lemma |51IS|) it follows that v_gapQ@^PR > 0. 

Now we can use Lemma ISP to conclude that WaitSeteo^PR is empty. Therefore P receives an 
empty WaitSet from G as part of its co-donation, and therefore UNTg is empty and we are done. □ 


4.3 CBCAST is Vacuum Convergent 

In our analysis of CBCAST we want to take advantage of the main finding of Section [51 namely the 
Fault Theorem (Theorem [IJ, and limit our attention to transactional histories. In order to do that 
we have to prove that CBCAST is vacuum convergent (see Definitions USD. 

Theorem 3. The CBCAST protocol is vacuum convergent. 


75 






Proof. Let P be a halting process in a CBCAST based conforming history. Look at step (O of the 
vacuum loop (see Definition fT^ . By the Conforming GMS Axiom process P has a finite view interval 
and therefore this step can only increment v a finite number of times. Afterwards either the loop 
exits or step m is not executed again. Assume that the loop never exits. Once v stabilizes, steps 
m and o are executed once and are not executed again. Therefore after a finite number of 
steps the vacuum loop degenerates to repeated executions of step ([S|) which consist of: processing 
packets on the self channel; queuing side-effect packets to their respective channels, including the 
self-channel; sending and receiving the queued packets on downstream channels, including the 
self-channel; processing the newly received packets on the self-channel; etc. Since donation and co¬ 
donation packets are never queued to the self channel, at this point such packets are not processed 
by the vacuum loop anymore. 

From Lemma [5] we know that the values of ghospheight and flush.height in P cannot rise above v. 
Since the ICheckFlushl procedure only generates ghost and flush packets when the ghost and flush 
height rise, the vacuum loop can only generate a finite number of such packets. Therefore after 
a finite number of steps no such packets are generated anymore. Therefore after some more time 
passes the vacuum loop processes the last of these packets and does not process any ghost or flush 
packets afterwards. 

Message packets are generated as a result of the processing of 

• a message broadcast request (when v_gap = 0) 

• a flush packet (when the flush causes view installations and LaunchQueue is not empty) 

• a co-donation packet (ditto) 

• a removal notification (when FwdQueue is not empty) 

Since at this point the vacuum loop does not process any additional items of these types, it only 
accumulates a finite number of message packets and as a result after some point it processes the 
last message packet and does not process any more such packets afterwards. 

Acknowledgement packets are generated as a result of the processing of message packets, donation 
packets and co-donation packets. Therefore the vacuum loop processes a finite number of those 
packets as well. 

So at some point the vacuum loop runs out of packets to process and is forced to fall through to 
step ©, contrary to our assumption. □ 


5 The History Reduction Mapping 

5.1 Introduction 

In this section we demonstrate the rather surprising fact that any transactional history of the CBCAST 
protocol that contains at least one join notification can be reduced to an alternate conforming history 
of CBCAST that performs the same computation and where the first join notification is replaced with 
a removal notification. This allows us to restrict our analysis to histories that contain no join 
notifications. 


76 




We will construct the reduced history explicitly. We will start with the original history, and make 
careful changes to it. The construction will revolve around the first joining process G, its parent D, 
and the join view of G, which we call the critical view. The idea is to declare that G is an original 
group member (member since view zero) and that it has a doppelganger -G that is also a member 
of view zero. Then instead of having G join the group, we have -G leave the group. We need to 
accomplish this while not violating the CBCAST protocol, and without affecting the user application. 
In fact if we have any hope of success, the user application must be completely oblivious to the 
change. To create the reduced history we will have to solve two problems. First we will have to 
create a whole new history for ±G during the pre-critical interval, namely the interval that precedes 
the critical view change. Then we will have to resolve the race conditions that occur as a result 
of untimely packets, namely packets that are sent before the critical view change notification and 
arrive afterwards. 

We will solve the first problem by using D as the pre-critical template for ±G. This means that every 
reactive move by D, like receiving or acknowledging a message, will be copied by ±G verbatim. 
However we will not copy proactive moves by D, namely original message broadcasts that are 
initiated by the application at D. During the pre-critical period, ±G will be passive participants. 
We will ensure that the APP thread does not run there (and therefore does not generate message 
queuing requests) by artificially delaying the launch of the APP thread until after the critical moment. 
This is possible to arrange because the launch of the APP thread in |protStart| is asynchronous. 

The second problem will be solved with the help of the donation/co-donation protocol. This protocol 
is carefully tailored to provide precise compensation for critical boundary race conditions. In the 
reduced history the critical donation and co-donation packets will be eliminated. All the simulated 
packet processing that occurs during the IReceiveDonationI and [ReceiveGoDonationI procedure calls 
will be replaced with the receipt and processing of actual, newly-minted untimely packets. 

The new packets and events will have to be added and timed in a very precise manner so that we 
neither violate CBCAST nor affect the user application. For example, pre-critical flush packets from 
±G will have to arrive slightly earlier than their counterparts from D, in order to prevent them 
from causing the receiving process to install a new view. But pre-critical forwarded messages from 
±G will have to arrive slightly later than their counterparts from D, in order to guarantee that 
they are redundant, and therefore ignored by the receiving process. We do not want ±G to rock 
the boat prematurely. 

To manage this careful surgery we need quite a bit of control. We will gain this control by creating 
a 4-coordinate label for each event. The label will describe which transaction the event belongs to 
and whether it is the transaction trigger or a side effect. It will describe whether the event was 
moved slightly forward or backward to insure redundancy. For events that occur during a donation 
or co-donation transaction, the label will also describe which simulated sub-transaction the event 
is related to. The most important fact about the labels is that they all come from a single partially 
ordered label space that is common to the original and reduced histories. The use of a single label 
space for both histories will allow us to relate the order in both, and ultimately to use induction 
over the shared label space to prove that both histories track each other closely and ultimately 
converge. 

Throughout this part of the paper iJ is a fixed transactional history that includes a first join view 
Vcrit- The process that joins at view Vcrit is denoted G throughout, and the parent process of G is 
denoted by D. We will sometimes refer to H as the original history. We will sometimes refer to 
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the synthesized history as the reduced history. 


5.2 Preliminaries 

5.2.1 Transactions 


A process executing the CBCAST protocol invokes a procedure in reaction to each triggering event. 
A view change notification event triggers an invocation of the IprotStart} [protRun protJoin 


or 


protRemove] procedures. A packet processing event causes the invocation of the |ReceiveMessage[ 


ReceiveAcki IReceiveGhosti IReceiveFlushi IReceiveDonationI or IReceiveCoDonationI procedures. A 


message broadcast request event causes an invocation of the |protBroadcast| procedure. The pro¬ 
cedure call in turn causes zero or more side effects in the form of packet queuing events. This 
sequence of events, starting with a trigger and continuing with a finite number (possibly zero) of 
side effects is a transaction. Because each process P runs its CBCAST procedures in a critical section, 
the events at each transaction occur as a contiguous sequence in Ep and therefore transactions can 
be read out of the history H directly. Compare this with the model-based definition of the notion 
of transaction in Section o 


5.2.2 A clean event order 


The partial event order -< in iA is a bit too weak for our labeling needs. Our analysis in 12.41 shows 
that this ordering is compatible with event views: a high-view event cannot precede a low-view 
event. But it does not have to succeed it either. Also, given the fact that all the view change 
notification events of a single view can be viewed as occurring at the same time, it would be 
convenient to collapse them into a single event. This is exactly what we will do now. 

Definition 19. The Clean Event Set (E, -<) is the partially ordered set obtained from the set E 
of events in H by collapsing all the notification events of each view i into a single element ii, with 
the induced partial order. Formally: 


E = {Gjo^iO-i-i U 


U \ 


V0 < 2<®-|-1 


e = £i and f = £j 

and i < j 

e = £i and f G Kj\ Gj 

and i < j 

e € Ki\Gi and f = £j 

and i < j 

e G Ki\Gi and f G Kj\ Gj 

and i < j 

e,/ G Ki\Gi 

and e -< / 


Notice that since notification events are necessarily trigger events, the elements of E can be divided 
into trigger events (which include all the £i events) and side-effect events. 

The transactions that are triggered by the critical donation and co-donation packets have a special 
place in our analysis, so we add some specific notation for them. 

Definition 20. 
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• Let d G be the critieal donation packet sent by P. We denote Crit(P G) = d^^ when 

d^^ exists. 

• Let c G be the eritical co-donation packet sent to P. We denote 01^(0 —>■ P) = 

When exists. 


5.2.3 Donation and co-donation sub-transactions 

The main concern of the IReceiveDonationI and IReceiveCoDonationI procedures is the execution of a 
sequence of [ReceiveMessa^ and [Receive AcH calls. These calls or sub-transaetions compensate for 
race conditions. When a new process joins, it is not possible to deliver to it an up-to-date consistent 
copy of ReplicatedData that will allow it to participate in the distributed computation without any 
further correction. The problem is that at the moment where the new process joins, some of the 
information required by it is trapped in packets that are in transit, carrying information that will 
arrive to the packet’s destination too late to be of help. 

What are these missing packets? the answer to this question is not easy. Naively, most of these are 
packets that were sent to or from the parent of the new process and failed to arrive by the moment 
of the view change, but the general picture is a lot more complicated because the parent itself may 
have joined the group in a previous view change, and the packet of interest may have been sent 
to/from an ancestor of the parent. Luckily, we only have to worry about the critical view change, 
which is the first process join in the history. This case, while not trivial, is considerably simpler. 
Our event labeling will reflect this fact, by labeling the sub-transactions of critical donations and 
co-donation transactions differently from non-critical ones. This way we can avoid an in-depth 
analysis of non-critical donations and co-donations, and let their meaning remain entangled in the 
magic of induction. 

In the proceeding discussion we will use the shorthand notation 

||vect|| = vect./ -|- vect.6 
For vectors like mpkGout, mpktJnW and index. 

Lemma 16. Let P be an original proeess (a member of view zero in H) and suppose that G processes 
a critieal donation packet from P. Then there is a one-to-one, order preserving correspondence 
between the following two sequences: 

1. The sequence of \ReceiveMessa^ invocations by G at labeled step [I] of the \ReceiveDonation\ 
procedure as it processes the donation packet from P. 

2. The queuing events of untimely message packets in channel Pi3, ordered by -<. Formally these 
are the events k'^^ G E that meet the following criteria: 

k GPi 

k =Pmsg{'^^S) 

Wcrit k^^ if k^^ exists 
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Proof. Let fc be a packet that meets all the criteria of ([2]). Then it is a message packet k = p„g(j(msg) 
that had been sent from P to D prior to the critical view change. When k is queued, P places 
a record (msg, index, iset) in its WaitSet. Since the packet does not make it to D prior to the 
critical moment, the message does not stabilize with respect to D and remains in WaitSet with a D 
instability at the critical moment. Moreover, it follows from Lemma [9] that at the critical moment 
at Z?, ||mp/(t_/n[P]|| < ||index||. Indeed we can say more than that: if k is the packet in the 
sequence of untimely message packets in the Pi3 channel, then || mp/ct_/n[P] || -\- i = ||index||. 

Process G starts life with the same value of mpktJn[P], inherited from D. Since the [protRun] 
procedure does not touch this value, and since the critical donation packet from P is the first 
packet that G receives from P, this value remains unchanged until G executes the IR eceivePonationl 
procedure. 

At the critical moment P executes the IprotJoiii] procedure. It adds G instability to the record, 
with iset[G]./ = iset[II]./ and iset[G].5 = 0. P then donates its state to G. When G processes the 
donation by invoking the IReceiveDonationI procedure, it finds the record with its G-instability and 
copies it to the UNTp set. G sorts the records in such a way that the records of untimely message 
packets from P are sorted in their broadcast order. 

To see why that is true, let k and k' be untimely message packets in pi) and suppose that 
k'^'^ >- k'^^. Then (iset[II]./)@fc/ > (iset[Z?]./)(@fe because this value is initialized from mp/(t_/n[£)]./ 
(see the |protBroadcast| and |protRemove| procedures) which is monotone increasing. As a result 
(iset[G]./)@fc' > (iset[G]./)@fc in the donated state from P and so 

heighti((msg', index', iset'[])) > heighti((msg, index, iset[])) 

If these heights are equal the sorting proceeds based on height 2 . This function is derived from 
the index component of the record, which is strictly increasing with each message broadcast at P. 
Therefore 

height 2 ((msg', index', iset'[])) > height 2 ((msg, index, iset[])) 

which proves that the records of the untimely message packets are sorted in the order at which 
those message packets were queued by P. 

Process G examines the records in the sorted UNT set one by one. Every time a record from UNTp 
is examined, its ||index|| is compared to ||/np/ct_/n[P]||. If ||index|| is bigger, the |ReceiveMessage| 
procedure is invoked, causing ||mp/(t_/n[P]|| to be incremented by 1. It follows from LemmaOthat 
if the message is timely then the comparison will fail and the call will not be invoked. So the call 
is invoked only for untimely message packets, in the right order, and causes ||mp/(t_/n[P]|| to be 
incremented each time. If the call is invoked for all the untimely message packets up to the one, 
then ||mp/(t_/n[P]|| grows by i — I up to that point, which is not enough to prevent the comparison 
from succeeding for the message packet. Therefore the call is invoked exactly for the records of 
untimely message packets, in the correct order, as claimed. □ 

Lemma 17. Let P he an original process (a member of view zero in P[) and suppose that G processes 
a critical donation packet from P. Then there is a one-to-one, order preserving correspondence 
between the following two sequences: 

1. The sequence ol \B.eceiveAclA invoca,tions by G at labeled stevlE of the \ReceiveDonation\ vrocedure 
as it processes the donation packet from P. 
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2. The queuing events of untimely forwarded acknowledgement packets in channel PD, ordered 
by Formally these are the events € E that meet the following criteria: 

k &pt) 

k =p^cK{msg) 

ORiG (msg) 

^vcrit {Dy^ k^^ if k^^ exists 


Proof. Let fc be a packet that meets all the criteria of ([2]). Then this packet had been sent from 
P to D prior to the critical view change, in reaction to the receipt of a message msg from D. The 
message must be forwarded because it did not originate with D. When D sends the message packet 
to P, it places a record (msg, index, iset) in its FwdWaitSet (see labeled step [6] of the |protRemove| 
procedure). Let P^sg = index./. 


As the critical parent, D is an original process and Lemma [5] applies. Therefore after P executes 
the ReceiveMessage procedure, its value of mpktJn[D].f is equal to Pmsg- Let Pent be the value of 
mpktJn[D].f at P at the critical moment. Since the receipt of the message msg at P takes place 
pre-critically, Pent > Pmsg- At the critical time P copies mpktJn[D].f to mpktJn[G].f (while zeroing 
out the .b component) before sending its donation to G (see the |protJoin] procedure). 


Since the acknowledgement packet for this message does not arrive by the critical moment, the 
record remains with P in its instability set (iset) and G starts life with the same record in its own 
FwdWaitSet, after inheriting it from D. Right away G zeroes out index. 5 in the record (see the 
IprotRun] procedure). Since the donation packet is the first packet ever received at G from P, the 
record must remain in FwdWaitSet at G until that moment, because there is no way until that time 
for the P-instability to be resolved. 


Following the execution of the IR eceivePonationl procedure the record of msg in FwdWaitSet makes 
it into the UNT^ set. Moreover, the value of the index./ = Pmsg component of the record is no 
higher than the critical value of mpktJn[G].f = Perit at P, which is exactly the value that arrives 
at G together with P’s donation. The .b field in both values is zero. Therefore the IReceiveAckI 
procedure gets invoked on behalf of msg. 


If k' is a packet that meets the same criteria and has a later queuing event than k, then k' = 
PACK('^sg') where msg’ is sent from D to P after msg is sent. Therefore the value of ||index|| for 
msg’ is higher and therefore the related sub-transaction is executed later, so the mapping in this 
direction is order preserving. 


The inverse correspondence is constructed in a similar way. Let (msg, index, iset) Be a record in 
UNTg that passes the comparison test 


||index|| < Ildonation.mpkP/n)!?]II 


From the construction of UNTg it follows that iset[P] exists. Therefore the message msg got into 
WaitSet after a message packet queuing event that included P in its target set. When and where 
did this queuing event happen? If it happened at G, then index must be strictly bigger than the 
initial value of mpkGout at G. This initial value is equal to zero in its .b component (see the 
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IprotRun] procedure) while its ./ component is equal to the critical value of mpkt_out.f at D. It 
follows from Lemma [5] that -Fcrit, the critical value of mpktJn[D].f at P is at most as high as the 
critical value of mpkt.out.f at D. Therefore index.6+ index./ > Fcrit- But as we have already seen, 
Fcrit = donation.mp/(t_/n[G']./ and donation.mp/(t_/n[G']. 6 = 0. This is a contradiction. Therefore 
the queuing event does not happen at G but rather at D, and the WaitSet record in question is 
inherited by G when it is created. From this it follows that the message did not originate with D. 
If it did, then the record would go to Beast WaitSet in D, and this part of WaitSet is not inherited 
by G but instead zeroed out (see the protRun| procedure). So ORiG(msg) ^ D. 


So the message msg is forwarded from D to P, and is not stabilized with respect to P by the critical 
moment. Why is it not stabilized? either it does not arrive at P on time (in which case the message 
packet from D to P is untimely), or else it does arrive on time but the acknowledgement packet 
from P to D is untimely. If the first case holds then by Lemma [5] Fcrit < index.6+ index./. As we 
saw, this leads to a contradiction. Therefore the acknowledgement packet k = p^,,j^(msg) that was 
sent from P to D was untimely. This means that the packet k meets all the criteria listed in the 
statement of the lemma. 


It is obvious that these two correspondences are inverses of each other and so we are done. □ 


Corollary 6. Let P he an original process and suppose that G processes a critical donation packet 
from P. Then there is a one-to-one, order preserving correspondence between the following two 
seguences: 


1. The sequence of ReceiveMessag^ and lReceiveAcM invocations by G at labeled steps [7] and[H of 
the \ReceiveDonation\ vrocedure as it processes the donation packet from P. 


2 . 


The que uing events of untimely message packets and forwarded acknowledgement packets in 
channel PD, ordered by 


Proof. Most of the claim follows directly from Lemmas [1^] and [T71 But we still have to show 
that if k, k' £ Pi3 are an untimely message packet and an untimely acknowledgement packet for a 
forwarded message respectively, then -< if and only if the corresponding records in UNTp 
and UNTg are processed in the same order. 

Let k — p„g(,(msg) and fc' = p^(,j^(msg'). Let index' be the value of mpkPout in D when it queues 
Pmsg('^^sO- This is the value of the index field of the record of msg' in WaitSet(Z?). 

Suppose first that ^ k'^^. Then P queues k before it processes PMSG(^sgO- By Lemma [SI at 
the time that P queues k, we have mpktJn[D].f < index'./ at P. But in fact the inequality is strict 
here because the message msg' is forwarded by D, not originated. Because of that it is mpkPout.f 
that is incremented when p^gj,(msg') is queued and mpktJn[D].f that is incremented when the 
same packet is processed. Let r, r' £ UNT be the records that correspond to k and k' respectively. 
We have already seen that heighti(r) = mpktJn[D].f and index(r')./ = index'./. Therefore 

heighti(r) = mpktJn[D].f < index'./ < heighti(r') 

which proves one direction of the claim. 

To prove the other direction, suppose that ^ k''^^. Then P queues k after it processes 

PMSG(msg'). By Lemma ini at the time that P queues k, we have mpktJn[D].f > index'./ and 
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following the same logic as before we conclude that heighti(r) > heighti(r'). In this case however 
we cannot guarantee a strict inequality. In case there is equality, however, the order between r and 
r' is determined by height 2 . Since r' S UNTg we have height 2 (r') = 0 while r G UNTp and therefore 

height2(r) = ||index(r)|| > index'./ > 0 = height2(r') 

where the rightmost inequality is strict because msg' is a forwarded message. □ 

Lemma 18. Let P be an original process (a member of view zero in H) that processes a critical 
co-donation packet from G. Then there is a one-to-one, order preserving correspondence between 
the following two sequences: 

1. The sequence of \ReceiveMesscL^ invocations by P at labeled step [7] of the \ReceiveCoDonation\ 
procedure as it processes the co-donation packet from G. 

2. The union of the following two sets of events, ordered by 

(a) The queuing events of untimely forwarded message packets in channel Ijf’. Formally 
these are the events k^^ G E that meet the following criteria: 

k gd^ 

=PMSG("JSg) 

orig( msg) 

VvcritiPT^ ~^k^^ if k^^ exists 


(b) The post-critical, pre-P-donation queuing events of message packets in 
Formally these are the events G E that meet the following criteria: 


channel 




k G^ 

k =p„^c{rr<sg) 

^ Crit(P^G) 


Note: The union of these two sets of events is linearly ordered by -< because according to the 
Parent Axiom ^ Grun = I'vcrit 

Proof. Let A: be a packet that meets all the criteria of (I2a|) . This packet was sent from D to P 
prior to the critical view change, containing a forwarded message msg. When D queued the packet 
it placed a record (msg, index, iset) in its FwdWaitSet. Let Pmsg = index./. Let Fcrit be the value of 
mpktJn[D].f at P at the critical moment. Then the fact that the packet is untimely, together with 
Lemma ini imply that Pmsg > T^crit- We can say more than that: if k is the packet in the sequence 
of untimely forwarded message packets in the channel, then Pmsg = ^crit + i- At the critical 
moment process P creates mpktJn[G].f = Pent- At the critical moment the value of mpkpout.f at 
D is equal to Perit + u where u is the number of untimely forwarded message packets in the 
channel. As a result G starts life with ||mp/(t_out|| = P^it + u (the |protRun| procedure zeroes out 
mpkpout.b). 
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Since the packet k is untimely, the record remains in FwdWaitSet, with P in its instability set, until 
the critical moment. Therefore G starts life with the same record in its own FwdWaitSet, and the 
IprotRun] procedure leaves index./ untouched. Since G does not receive any packet from P until it 
receives its donation, the record remains in FwdWaitSet at G until Crit(P —?■ G) and therefore it is 
sent to P as part of the co-donation from G. As a result, when P processes the co-donation from 
G it finds the record with its P-instability, and places it in UNTg. 


P sorts the records in such a way that the records of untimely message packets from D are sorted 
in their broadcast order. To see why that is true, let k and k' be untimely forwarded message 
packets in UP such that k''^^ >- k^'^. Then ||(iset[P])@/c/1| > ||(iset[P])@fe|| at D because this value 
is initialized from ||mp/(t_/n[P]|| (see the protRemove] procedure) which is monotone increasing. The 
value of iset is not changed by the |protRun procedure so the same relationship holds in G and so 
this is what P sees in the co-donated state that it receives from G. Therefore 


heighti((msg', index', iset'[])) > heighti((msg, index, iset[])) 

If these heights are equal the sorting proceeds based on height 2 . This function is derived from 
the index component of the record. The |protRun| procedure zeroes out index.6 but does not touch 
index./. The original value of index./ is derived from the value of mpkt.out.f at D. Since k is 
a forwarded message packet, the value of mpkt.out.f is increased after D queues fc, and therefore 
\ndex.f > index./. Therefore at G, where the .b component is zero for the records of both k and k' 
we have ||index^|| > ||index|| and therefore 

height 2 ((msg', index', iset'[])) > height 2 ((msg, index, iset[])) 

which proves that the records of the untimely message packets are sorted in the order at which 
those message packets were queued by D. 

Now let us examine the other subset of packets. Let A: be a packet that meets all the criteria of 
(120. When G queues k it does not have P in ContactSet, because ContactSet is initialized in G to 
contain only itself (see the |protRun| procedure), and P is only added to the set when G processes 
the critical donation from P. As a result G does not send a message packet to P even though 
it does create an iset[P] entry in the instability set of the message in its WaitSet (to see that, 
check the |protBroadcast| and |protRemove| procedures to see that message packets are only sent to 
contacted processes, while instability is created to each process that has an entry in the mpktJn 
vector. According to Lemma I5t|5|) mpktJn has entries exactly for the live processes.) This makes it 
impossible for the record (msg, index, iset) of the message to stabilize with respect to P by the time 
that G processes the donation from P, and as a result the record survives in the co-donated state 
that G sends to P, together with its P-instability. As a result P copies the record into UNTg. 

The value of index in the record is initialized from the value of mpkt_out at G. As we noted before 
G starts life with ||mp/(t_out|| = Pent -F u where u is the number of packets meeting the criteria of 
Eal Since mpkt_out is incremented by G every time it queues message packets, we can conclude that 
the packet that meets the criteria ofl2blhas a record with ||index|| = Pent +u + j. In other words 
if we take the two sequences [5^ and [20 as one, then the record of the packet in that sequence 
has ||index|| = Pent -F i- 

We want to show that these records are also sorted by P according to their ^-order with respect 
to each other and also with respect to the records of the untimely Dp packets. 
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Let k and k' be two packets in Gd that meet the [13 criteria, and let be a packet in DP that 
meets the |13 criteria, k and k' are post-critical and both have records that retain their P instability. 
Moreover, iset[P] is initialized in both cases from mpktJn[P] at C?, a value that starts out being 
equal to the critical value of mpktJn\P] at D. This value is untouched by the jprotRunj procedure, 
and remains unchanged until the moment that G processes the donation from P, because there are 
no incoming message packets from P to G until that time. Therefore iset[P] = iset'[P]. Similarly, 
gets its value of iset[P] from a pre-critical value of mpktJn[P] at D. Therefore 

heighti((msg', index', iset'[])) = heighti((msg, index, iset[])) > heighti((msg°, index°, iset°[])) 

So the heighti function does not resolve the order between the records of k and k’, and if it resolves 
it between and k it does so in the desired way. 

Both k and k' derive their index value from the value of mpkGout at G, which increases with each 
message packet queuing event, therefore 

height 2 ((msg', index', iset'[])) > height 2 ((msg, index, iset[])) 

and we have the desired order relation in this case. The record for packet k^ at D derives its 
index value from a pre-critical value of mpkt_out at D. G inherits mpkPout from D with the .b 
component zeroed out. G inherits the k^ record and similarly zeroes out its index.6 value. Post- 
critically, mpkGout at G continues to grow from this initial value with each message broadcast. 
Therefore 

height 2 ((msg, index, iset[])) > height 2 ((msg°, index®, iset®[])) 
and we have the desired order in this case as well. 

Process P examines the records in the sorted UNT set one by one. Every time a record from UNTg 
is examined, its ||index|| is compared to \\mpktJn[G]\\. If ||index|| is bigger, the [ReceiveMessa^ 
procedure is invoked, causing ||mp/(t_/n[G]|| to be incremented by 1. 

There are exactly three types of records in UNTg. There are records for message packets that were 
queued by G itself, namely post-critical message packets that meet the criteria of 123 Then there 
are records that were inherited from D. These are records of pre-critical message packets. Some of 
these packets are untimely and meet the criteria of I2al while others are timely. 

It follows from Lemma |3 that if the message is timely then the comparison will fail and the procedure 
will not be invoked. So the procedure is invoked only for untimely and post-critical message packets, 
in the right order, and causes ||mp/(t_/n[G]|| to be incremented each time. If the procedure is invoked 
for all the untimely and post-critical message packets up to the one, then ||mp/(t_/n[G]|| grows 
by i — 1 up to that point, which is not enough to prevent the comparison from succeeding for the 
message packet. Therefore the procedure is invoked exactly for the records of message packets 
that meet one of the two criteria, in the correct order, as claimed. □ 

Lemma 19. Let P be an original process (a member of view zero in H) that processes a critical 
co-donation packet from G. Then there is a one-to-one, order preserving correspondence between 
the following two sequences: 

1. The sequence of \ReceiveAcM invocations by P at labeled step @ of the \B.eceiveCoDonation\ 
procedure as it processes the co-donation packet from G. 
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2. The queuing events of untimely acknowledgement packets in channel DP, ordered by -<. For¬ 
mally these are the events k^^ S E that meet the following criteria: 

keljP 

k = Pj^cAmsg) 

-< v^.rADr 

Vv„it{PT^ ^ k^^ if k^^ exists 

Proof. Let Prit be the value of mpktJn[P] at D at the critical moment. G inherits mpktJn[P] from 
D. This value is not changed by the |protRun| procedure, and it remains unchanged until the receipt 
of a donation packet from P, since this is the first packet of any kind that G receives from P. Upon 
receipt of the donation packet from P, G immediately sends a co-donation packet to P. Therefore 
when P executes the IReceiveCoDonationI procedure, we have: 

||co_donation.mp/(t_/n[P] II = ||/crit|| 

Let fc be a packet that meets all the criteria of ([2]). Then this packet was sent from D to P prior 
to the critical view change, in reaction to the receipt of a message msg from P. When P sends the 
message packet to D, it places a record (msg, index, iset) in its WaitSet. Let /msg = index. 

Since the acknowledgement packet for this message does not arrive by the critical moment, the 
record remains in WaitSet at P, with D in its instability set until then. At that moment, P adds a 
G instability to the record (see labeled step[T]of the |protJoin| procedure). This additional instability 
ensures that the record will remain in WaitSet until the first packet from G arrives. This packet is 
the co-donation from G. Therefore when P processes the co-donation packet it discovers the record 
in its WaitSet and copies it into UNTp. 

As the critical parent, D is an original process and Lemma [U] applies. Therefore after D executes 
the |ReceiveMessage| procedure, its value of mpktJn[P] is equal to /msg- Since the receipt of the 
message msg at D takes place pre-critically, ||/crit|| > ll-^msgH- Therefore: 

\\co_donation.mpktJn[P]\\ = ||Jcrit|| > ||/msg|| = ||index|| 

therefore the record results in the invocation of the IReceiveAckI procedure at labeled step [2] of the 
IReceiveCoDonationI procedure. 

If k' is a packet that meets the same criteria and has a later queuing event than fc, then k' = 
PACK('^sg') where msg’ is sent from P to D after msg is sent. Therefore the value of ||index|| for 
msg’ is higher and therefore the related sub-transaction is executed later, so the mapping in this 
direction is order preserving. 

The inverse correspondence is constructed in a similar way. Let (msg, index, iset) Be a record in 
UNTp that passes the comparison test 

||index|| < ||co_donation.mp/ct_/n[P]II 

From the construction of UNTp it follows that isetlG] exists. Therefore the message msg got into 
WaitSet after a message packet queuing event that included a packet being sent to G (post-critically) 
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or to D (pre-critically). We must eliminate the post-critical case. If the queuing event is post-critical 
then index must be strictly bigger than the critical value of mpkt_out at P. Call this critical value 
Ocrit- It follows from Lemma [5] that Ocrit > .Icrit- Therefore 


index > ||Ocrit|| > ||.fcrit|| = ||co_donation.mp/ct_/n[P]|| 

contradicting our assumption. It follows that msg must have been sent pre-critically to D. We 
know that D processed the message packet pre-critically because the relation 

||index|| < ||co_donation.mp/ct_/n[P] II = ||/crit|| 

implies it according to Lemma IHl It is obvious that these two correspondences are inverses of each 
other and so we are done. □ 


Corollary 7. Let P be an original proeess that processes a critieal co-donation packet from G. 
Then there is a one-to-one, order preserving correspondenee between the following two sequences: 


1. The sequence of ReceiveMessag^ and \B.eceiveAcU invocaMons by G at labeled stepsUl and\M of 
the \ReceiveDonation\ vrocedure as it processes the donation packet from P. 


2. The queuing events of untimely forwarded message packets in channel PL), the queuing events 
of untimely acknowledgement packets in channel pi) and the queuing events of post-critical, 
pre-P-donation message packets in channel G^, ordered by 


Proof Most of the claim follows directly from Lemmas [TH] and [121 But we still have to show that 
ii k,k' G are an untimely forwarded message packet and an untimely acknowledgement packet 
respectively, and if k" G G^l is a pre-P-donation message packet then 

• ^ k'^'^ if and only if the corresponding records in UNTg and UNTp are processed in the 
same order. 

• The record for k” in UNTg is processed after the record for k' in UNTp. 

Let k — k' — p^(,j^(msg'). Let index' be the value of mpkt.out in P when it queues 

p^gcimsg'). This is the value of the index field of the record of msg' the WaitSet(P). 

Suppose first that k^'^ ^ k'^^. Then D queues k before it processes p^g^(msg'). By Lemma [21 at 
the time that D queues k, we have ||mp/(t_/n[P]|| < ||index'|| at D. Let r,r' G UNT be the records 
that correspond to k and k' respectively. We have already seen that heighti(r) = ||mp/(t_/n[P]|| and 
index(r') = index'. Therefore 

heighti(r) = ||mp/(t_/n[P]|| < ||index'|| = heighti(r') 

which proves one direction of the claim for k and k'. 

To prove the other direction, suppose that k^^ >~ k'^^. Then D queues k after it processes 
PMSG(uisg'). By Lemma 121 at the time that D queues k, we have \\mpktJn[P]\\ > ||index'|| and 
following the same logic as before we conclude that heighti(r) > heighti(r'). In this case however 
we cannot guarantee a strict inequality. In case there is equality, however, the order between r and 
r' is determined by height 2 . Since r' G UNTp we have height 2 (r') = 0 while r G UNTg and therefore 

height 2 (r) = ||index(r)|| > 0 = height 2 (r') 
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We still have to show that the record for k” is placed later than the record for k' in the sorted 
order. As we saw before, G inherits the value of mpktJn[P] without change, and since D pro¬ 
cesses p^^sQ(msg') pre-critically, the critical value of ||mp/(t_/n[P]|| at D is at least as high as 
||index'||. Therefore at the moment that G queues k” we have ||mp/(t_/n[P]|| > ||index'|| and therefore 
heighti(r") > heighti(r') where r" is the record corresponding to k". The rest of the argument is 
the same as the similar argument for k^'^ >- k'^'^. □ 

Lemma 20. Let P be an original process (a member of view zero in H) that processes a critical 
co-donation packet from G, and suppose that the invocation of \ Try Tolnstal l\ at labeled step O of 
the \ReceiveCoDonation\ procedure results in the installation of the pending views at P. Let v = 
cur_view v^gap be the view height at P at the moment that it processes the critical co-donation. 
Then G must have queued a Pplush(i') Packet prior to dequeuing the donation from P. Moreover, 
after queuing that packet and until the donation from P is dequeued, G does not queue any ghost, 
flush or message packets. 

Proof. The proof of Lemma [15] shows that if jTryToInstallj actually installs the pending views, then 
the value of flush_height at G must have been equal to v at the time that G processed the donation 
from P. Let vq = flush.heightQ@.„^ (G)pr be the initial flush height at G (see Definition [T71). vq is 
equal to ghost_heightjj,^.„^ therefore vq < Vcrit according to Lemma |SJ|T] and |S]) . Lemma 

[5pD also implies that v > Vcrit because the co-donation is processed at P after the critical view 
change. This implies that flush_height at G increases between the time G is created and the time 
that it processes the donation from P. The only way for that to happen is for G to queue one 
or more flush packets. The last one of the flush packets before the receipt of the donation from 
P would reflect the value of flush_height at the time. Therefore this last flush packet must be 
fciast = Pplush('*^)- We have to show that G does not queue any additional regular packets between 
/ciast and the processing of the donation from P. We already know that G does not queue any 
additional flush packets in that interval. From Lemma |H1IS|) and the Piggyback Axiom we know 
that G also does not queue any ghost packets. It follows from the proof of Lemma jUj that G does 
not queue any message packets either. □ 

5.3 The Label Space 

We construct an infinite, very well founded, partially ordered set £, that we call the label space. We 
use this space as a medium for inducing an ordering on the events in . We do it the following 
way. First we assign a label from £ to each event in iL in a ^-order preserving manner. Then we 
expand the assignment to events in H'~. Then we use the latter labeling to induce an order on the 
events in . 

Each label is made up of four coordinates: the constellation coordinate, the sub-transaction co¬ 
ordinate, the adjustment coordinate and the side-effect coordinate. Each coordinate comes from 
its own very well founded partially ordered set, and the ordering on £ as a whole is left-handed 
lexicographic, with the constellation coordinate being the major one. 

All the events in a single transaction share the same constellation coordinate, but the converse is 
not true. It is possible for multiple transactions to share the same constellation coordinate. The 
set of all the transactions that share each constellation coordinate is called a constellation. In the 








original history H only notification transactions are grouped into constellations that contain more 
than one transaction. In however this is not the case. We will prove by induction that H and 
converge. The proof will proceed by induction over constellations (normal induction can be 
carried over any very well-founded partially ordered set, not just over natural numbers). 


5.3.1 The constellation coordinate set 

This set is made up of all the trigger events in E with the ^ partial order (see Definition [H]) . 
Definition 21. We use the symbol £c to denote the constellation set. For any transaction T the 
clean trigger trig{T) G E is an element o/£c- We denote that element by It- For any event e G E'^ 
we denote £e = ^trans{e) ■ If T is a notification transaction for view i then by definition £t = li- 


5.3.2 The sub-transaction coordinate set 

This coordinate is only used for differentiating sub-transactions within donation and co-donation 
transactions. For all other events we use a special zero symbol for this coordinate. These sub¬ 
transactions simulate the processing of packets that were supposed to be sent to or from a newly 
joining process, but that were not sent due to race conditions. For the purpose of constructing Fl^ 
the labeling of the critical donation and co-donation sub-transactions is crucial. As for the non- 
critical donations and co-donations (i.e. those that involve processes other than G that join after the 
critical view change), their labeling is a lot less important. It would be nice to use the same labeling 
scheme for all donations and co-donation, and this is indeed possible, but very complicated and not 
really worth the effort. So reluctantly we use two separate systems of labeling sub-transactions. 
One for the critical ones and one for the non-critical ones. 

The analysis in 15.2.31 shows that the critical sub-transactions in both donation and co-donation 
transactions can be ordered using related events from E. These events are all queuing events of 
packets, ordered by -<. This is true for non-critical sub-transactions as well but we have not shown 
that. So we will simply use positive integers to label those sub-transactions. Together with the 
additional special zero element this yields: 

£s = {0}+(e[]n) 


5.3.3 The adjustment coordinate set 

This coordinate is used for slight adjustments of trigger events in i/'", forcing them to happen either 
slightly earlier or slightly later than related triggers. The set contains nine elements: 

< ^(+g) < ^(-f) < ^(+f) < < 0 < 

Where the elements refer to the time adjustment in processing the following packets: 

• JJ-*^“g): A ghost packet from -G 

• l],(+gb A ghost packet from G 

• : A flush packet from -G 
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• : A flush packet from G 

• : An acknowledgement packet from -G 

• An acknowledgement packet from G 

• fi-k™); A message packet from -G 

• -||(+m): A message packet from G 

5.3.4 The side-effect coordinate set 

This coordinate is used for differentiating events within a transaction (or a sub-transaction, in the 
case of a donation or a co-donation that includes multiple sub-transactions). The set contains an 
infinite sequence of elements 

£f = {6 < GODONATE < DONATE < ACK < GHOST < FLUSH < 

< BGASTi < BCAST2 < ... } 

Where 0 is indicates a trigger event and the meaning of each of the other symbols is self explanatory. 


5.3.5 Labeling the events in H 

As we mentioned, each label contains four coordinates, one of each type. Formally 

£ = £c X ■Gs X ila X £f 

If ^T £ Gc, s G Gs, a G Ga and / G Gf, we use the notation [iT-s.a\f] to denote the label with 

those coordinates. If T is a critical donation or co-donation transaction we use the notations 

[Grit(G P).s.a\f] and [Crit(P ^ G).s.a|/]. 

We want to create a map A : G that assigns a label to each event in H and preserves order 

in the sense that if e ^ / then A(e) < A(/). 

Let e be any event in H. We create the label of e = [£T-s.a\f] coordinate by coordinate: 

The constellation coordinate: 

We simply use the constellation f'trans(e) ■ 

The sub-transaction coordinate: 

• If 4rans(e) = Crit(P ^ G) and e is a side effect of one of the sub-transactions (labeled 
steps IT] and of iReceiveDonationI) then Corollary [5] implies that there is a corresponding 
event / = G E where k G Pi3 is an untimely packet. We use / as the sub-transaction 
coordinate for e. 

• If 4rans(e) = Crit(G —>■ P) and e is a side effect of one of the sub-transactions (labeled 
steps [U and [5] of IH.eceiveCoDonationP then Corollary [7] implies that there is a corre¬ 
sponding event / = G E where either k G IDp is an untimely packet or fc G G(3 is a 
post-critical, pre-P-donation packet. We use / as the sub-transaction coordinate for e. 


90 






• If 4rans(e) = Crit(G —>■ P) and e is a side effect of the |TryToInstall| procedure invocation at 

labeled step [5|of |ReceiveCoDonationl then LemmaEOl implies that there is a corresponding 
event / = € E where k S Gd is the last pre-P-donation flush packet, and k = 

PpLusH (i') where v is equal to the view height cur_view + v_gap at P at the time that it 
processes the co-donation. We use / as the sub-transaction coordinate for e. 

• If ^trans(e) Is 8- non-critical donation or co-donation transaction, and e is a side effect 
of one of the sub-transactions then we label e with a serial number, with all the side 
effects of the first sub-transaction receiving a value of 1, the side effects of the second 
sub-transaction receiving a value of 2, et cetera. 

• If e is any other event then we label e with the special value 0. This includes the cases 
where e is 


— a trigger event. 

— a side effect of a transaction that is neither a donation nor a co-donation. 


— the queuing event of the co-donation packet in a donation transaction. 


— a side effect of the TryToInstall invocation in a non-critical co-donation transaction. 


The adjustment coordinate: 

In H this coordinate is always 0. We only make non-trivial adjustments for events that are 
added in . 


The side-effect coordinate: 

We assign this coordinate according to the side-effect. If e is a trigger event, we use the special 
zero value 0. Otherwise e = k'^^ for some packet. In that case we look at that side effect that 
produced k: 

• If A: = Ili6n we assign the value CODONATE. 

• If A: = pQo^^.j,j,(donation) then we assign the value DONATE. 

• If A: = PA(;.K(msg) then we assign the value ACK. 

• If A: = Pghost('^) assign the value GHOST. 

• If A: = Pplush(i') tlisn we assign the value FLUSH. 

• If A: = p^s( 3 (msg) then we have to differentiate between several cases: 

— If A; is an original message packet, in the context of a message broadcast request 
transaction, we assign the value BCASTi. 

— If A: is an original message packet p„g(,(msg) in the context of a flush packet pro¬ 
cessing transaction (see Lemma fHll . we assign the value BCASTi if msg is the 
original message out of LaunchQueue. 

— If A; is an original message packet p^gj,(msg) in the context of a co-donation packet 
processing transaction (see Lemma [m, we assign the value BCASTi if msg is the 
jt/i original message out of LaunchQueue. 
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— If fc is a forwarded message packet p^,sG('^sg) the context of a notification transac¬ 
tion (removal of process P), we assign the value BCAST^ if msg is the forwarded 
message out of FwdQueue[P] 


5.3.6 Some labeling examples 


Suppose that the application at some process P in H decides to broadcast a message. This decision 
constitutes a message broadcast request event Ct that results in the invocation of the [protBroadca^ 
procedure. If v.gap = 0 the message gets broadcast, which means that a set of message packets 


, kp^ 


one packet per contacted process, are queued to their respective channels. All of these packets share 
a single queuing event e^: 

kp,^^ = kp,^^ = --- = kp^'^^ = e. 

If v.gap > 0 then there is no queuing event. Therefore we have a transaction 


And the labeling yields 


T = tranSet 


{et < eg} if v_gap = 0 
{et} if v_gap > 0 


A(et) = [^t.6.6|6] 

A(e,) = [£t.0.0|BCASTi] 


In another example, suppose that the critical joining process G processes a donation from P. The 
donated state from P includes a record (msg, index, iset) in WaitSet. This record was placed in 
WaitSet by P when it sent an untimely message packet km = PMSG('^5g) Suppose that the 

record causes G to invoke the [ReceiveMessa^ procedure. When G executes that call it queues an 
acknowledgement packet kg = PApK('T^sg) to P. The queuing event kg^'^ is labeled 

A(fc,‘^“) = [Crit(P ^ G)./fc^'^''.0|ACK] 

Theorem 4. The map A : £ is order preserving 

Proof. Let ei,e 2 € E^ be any events such that ei ^ 62 . Let A(ei) = [^ei-Sei-Ol/eJ and A(e 2 ) = 
[^62.562- 01 / 62 ]• Let ei € Epj and 62 € Epj. 

If Pi ^ P 2 then by the minimality of ^ (Minimal Order Axiom) there must be a trigger event et at 
P 2 such that ei -< et ^ 62 - The events at P 2 are completely ordered by ^ (Process Order Axiom). 
It follows from Definition [H] that et ^ trig(e 2 ). By the same definition trig(ei) ^ ei. It follows that 
trig(ei) ^ trig(e 2 ) and therefore £ei <le 2 - Since A is left lexicographic it follows that A(ei) < A(e 2 ) 
and we are done in this case. 

If Pi = P 2 then it follows from Definition [m that trig(ei) ^ trig(e 2 ) and iei^£e 2 - If the inequality 
is strict then we are done. So assume that .^ei = £e 2 - This implies that ei and 62 belong to the 
same transaction T at process Pi. 
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We will go over each transaction type and verify that the events in the transaction are labeled 
with monotonically increasing labels. We rely on the analysis of side effects fsee I4.2.f] - I4.2.5p . We 
will denote by et the trigger of T, and denote by Csi, Csj,... the side effects of T. Remember 
throughout the following analysis that A(et) = [^t-O-OIO]. 

Message broadcast request transaction 

According to Lemma [TT] an application transaction T has one of the following forms: 

• No side effects: T = {ct} and there is nothing to prove. 

• A message packet multicast: T = {et ~< where (msg)'^'^ 

[^t-6.6|0] < [£t-0.6|BCASTi] 


Notification transaction 

According to Lemma [12] a notification transaction T has one of the following forms: 

• No side effects: T = {et} and there is nothing to prove. 

• A ghost packet multicast: T = {et -< Cg^} where Cg^ = Pghost(^)'^” 

[£t-0.0|6] < [£t.6.6|GHOST] 

• A ghost packet multicast followed by a flush packet multicast: T = {et ^ Cg^ Cg^} 
where eg^ Pghost(^) and eg^ Pflush(^) 

[£t-6.6|6] < [£t-6.6|GHOST] < [£t-6.0|FLUSH] 

• A sequence of message packet multicasts: T = {e* -< Cg-^ ^ eg^} where eg^ = 

PMSG(msg,)‘^’" 

[£t-0.0|0] < [£t-0.0|BCASTi] < ••• < [£t-0.0|BGAST„] 

Message packet processing transaction 

According to Lemma lRTl a message packet processing transaction T has the form T = {et, 
where Cg^ = PAGK(msg)‘^" 

[£t-6.6|0] < [£t-6.0|ACK] 

Acknowledgement packet processing transaction 

According to Lemma [T2| an acknowledgement packet processing transaction T has one of the 
following forms: 

• No side effects. 

• A ghost packet multicast. 

• A flush packet multicast. 

• A ghost packet multicast followed by a flush packet multicast. 

The monotonicity of the labeling here works just as in the notification transaction case. 
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Ghost packet processing transaction 

According to Lemma [T3] a ghost packet processing transaction T has no side effect so there is 
nothing to prove in this case. 

Flush packet processing transaction 

According to Lemma[T3]a ffush packet processing transaction T has one of the following forms: 

• No side effects: T = {et} and there is nothing to prove. 

• A sequence of message packet multicasts: T = {et ~< ^ ^ es„} where = 

PMSG(msg,)‘^“ 

[£t-6.6|0] < [£t-6.0|BCASTi] < ••• < [£t-6.6|BCAST„] 


Donation packet processing transaction 

According to Lemma [TS] a donation packet processing transaction T has the form of a co¬ 
donation queuing event followed by a sequence of zero or more sub-transactions 


T — {et, Cc, j, ■.., Csi fc , es2,i > ■ ■ ■ j c 








where Cc is the queuing event of the co-donation packet and ^, ■ • •, es^are the side effects 
of the sub-transaction. 

Within each sub-transaction the sub-transaction coordinate of the label is the same for all 
side effects. Let 

= [^T-Si.0\fij] 

Then it follows from the previous cases that for each i the labels are monotonic: 

where Si is the sub-transaction coordinate of the sub-transaction. If the donation is not 
critical then Si = i, and therefore Si < and the whole sequence of sub-transactions is 
monotonically labeled. If the donation is critical then Si = where ki is the untimely packet 
that is related to the sub-transaction by Corollary [HI The same corollary guarantees that 
if i < j then ki^'^ -< Therefore in this case as well the whole sequence of sub-transactions 
is monotonically labeled. In addition 

A(et) = [£t- 0.0|6] < [£r-6.6|CODONATE] = A(ec) 

and both packets have a sub-transaction coordinate that is equal to 0. Therefore the whole 
donation transaction is labeled monotonically in all cases. 

Co-donation packet processing transaction 

According to Lemma[T5]a co-donation packet processing transaction T has one of the following 
forms: 

• A sequence of zero or more sub-transactions 

~ {^t) Csi,i ) • • ■ J Csi,fcj ) 6 s 2,1 ) • ■ ■ ) ®S2,fc2 ’ ■ ■ • ’ ) ' ’ ’ ) ^Sn,kn i 

where i,..., j,. are the side effects of the sub-transaction. This case is resolved 

the same way as a donation transaction, with Corollary 0 guaranteeing that the order of 
the sub-transaction coordinates is correct in the critical case. 
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• A sequence of one or more message packet multicasts: T = {et -< ^ es„} where 

Cs- = If the co-donation is non-critical the labeling looks as follows 

[£t-0.6|0] < [£t-6.0|BCASTi] < •■• < [£t.0.0|BCAST„] 

If the co-donation is critical then according to Lemma [201 there is a special post-critical 
flush packet queuing event / G E and the labeling is 

[^t.6.6|6] < [£t./.6|BCASTi] < ••• < [£t./.6|BCAST„] 


□ 


5.4 Defining The History Rednction Mapping 

5.4.1 Preliminaries 

We now show how to construct a history that carries the same computation as H but where 
the critical join event of G is replaced by a removal event of a different process -G. The interesting 
part is the construction of E^ which will rely on the label space that we constructed earlier. We 
start with the construction of most of the other components of 

pff" ^ pH y|_Q| 

Pf =P^U{-G} 

_ f if Z > Vcrit 

* "\sfu{G,-G} ifz<VeHt 


The main task is constructing the packets and related events on channels that involve the processes 
G and -G. We build up these channels starting with the history H as a. base, and then adding and 
removing packets as necessary in an attempt to simulate what would have happened if G and -G 
were original members of the group and just happened, by some rare chance, to proceed (almost) 
exactly as the parent D would up to the point of the critical view change, after which -G is removed 
and G remains to evolve as it originally did in the history H. 

The rest of this section is dedicated to the construction of the channels that involve G and -G. The 
construction involves a detailed description of the packets that are added to these channels, as well 
as the few packets that are removed. Together with each packet we add its associated events and 
their labels. In addition we describe the requisite changes to and additions of notification events. 
This construction amounts to a description of most of the missing ingredients in including all 
the channels and all the events. To complete the definition of we just need to add in the partial 
order on the events. 

We construct the events in with the goal of simulating a reality where the following things 
happen: 
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• G and -G are group members from the start. However, their upper layer application does 
not get started until the critical view change. Therefore G does not originate any message 
broadcasts up to the critical point, and -G never originates any message broadcasts at all. 

• Other than message origination (and the related flush packets - see below), G and -G proceed 
exactly like D up to the critical view change. This happens by luck, as multicasts arrive at G 
and -G at about the same time that they arrive at D, and multicasts and responses from G 
and -G arrive at other processes at about the same time that similar multicasts and responses 
arrive from D. 

• Despite their participation in group communications, G and -G, by sheer luck, have no effect 
on the state of the group. This happens because acknowledgements from G and -G arrive too 
early (just before similar acknowledgements arrive from D) and therefore are never decisive 
in stabilizing messages; forwarded messages from G and -G arrive too late (just after similar 
forwarded messages arrive from D) and as a result they are discarded as duplicates; and flush 
packets from G and -G arrive too early (sometimes long before similar packets arrive from D) 
to be the deciding factor in view installation decisions at the receiving processes. 

• One area where G and -G are allowed to diverge from I? in a controlled fashion is with the 
multicasting of flushes. This is unavoidable because G and -G do not originate messages and 
therefore their respective wait sets tend to be emptier than the one at D. Therefore G and 
-G may be forced by the CBCAST protocol to multicast a flush long before D is ready to do so. 
Managing this difference is the whole purpose of the ghost multicasts. These multicasts do 
not depend on the instability of original messages and therefore can happen at the same time 
at G, -G and D. Flush multicasts out of G and -G, however, do not happen at the same time 
that similar multicasts occur at D. Rather they immediately follow each ghost multicast. 


5.4.2 Some notation 

It should hopefully be intuitive at this point that the simulation is achieved, to a large degree, by 
adding clone packets that mimic on ±G-bound channels what the original packets did on similar 
D-bound channels. To accommodate the fact that the new flush packets mimic original ghost 
packets rather than original flush packets, we also introduce the notion of a zombie packet. A 
zombie packet, as the name implies, is a clone of ghost a packet that comes back to life as a flush 
packet, but otherwise remains unchanged. 

The label space £ plays a crucial role in ordering the new packet events that are created along with 
the new clone and zombie packets. Instead of ordering these events directly (which can get very 
complicated) we label them first. Then we introduce an order on all the events in H'' by inducing 
it back from the labels. 

The central challenge in building is the seam between the pre-critical period and the post-critical 
period. This problem presents itself in the form of packets that are sent pre-critically, but arrive 
post-critically (including packets that never arrive). Recall that such packets are called untimely. 
We deal with untimely packets using the donation and co-donation transactions. In the clones 
and zombies of untimely packets arrive exactly at the time that the receiving process begins the 
sub-transaction that simulates the arrival of the original packets (see labeled steps [T] and [2] of the 
IRecciveDonationI procedure and labeled steps [T] and [D of the IReceiveCoDonationI procedure at l3.5l) . 
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This makes the dequeuing events of donation and co-donation packets pivotal in the definition of the 
reduction mapping, warranting a specific notation that we introduced in Definition to describe 
them. 

Definition 22. Let msg be a stamped message in H. The reduction of msg is an identieal message 
ms^, but with a slightly different stamping 


ORiG(msg'') 

viEw(msg^) 

VT(msg^)[] 


orig( msg) 

VIEW (msg) 

fvT(msg)[] U {[G] = 0, [-G] = 0} viEw(msg) < Vcut 
[ VT ( msg) [] VIEW ( msg) > Vcrlt 


For any data structure A in FI, the reduction of A is an identieal data structure A'' where all the 
messages contained in A or its substructures have been replaced by their reductions. 


If A is a data structure in H with reduction and B is a data structure in such that B = A'", 
we say that B is equivalent to A and we denote it by B = A. 

Definition 23. 1. A clone of a packet k in H is a new packet in H'", on a different channel, 

whose type is identical to the type of k and whose content is the reduction of the content 
cont(k). A clone of an event e in H is a new event in of the same type as e. A clone 

event can occur at the same process or at a different one. 

2. A zombie of a ghost packet is a clone where, unlike a noirmal clone, the type of the packet 
changes from ghost to flush. A zombie event is a clone of a ghost packet queuing event in H 
whose type in a flush packet queuing event. 


3. A non-G process is any P ±G. 


We now turn to the heart of the construction of , which involves the construction of its channels 
and events. The bulk of the work is in constructing the channels and the related packet events, and 
then there is a bit of additional work in constructing the notification events. We proceed in multiple 
steps. We start with constructing new queuing events and their labels. We continue with channels 
where the source is ±G and the target is non-G, then channels where the source is non-G and the 
target is ±G, and then the four channels ±G ±(i. We construct new trigger events and their labels 
in tandem with the channel construction. Then we construct the new notification events and their 
labels, and finally we construct the order out of the labels. 


5.4.3 Constructing the new queuing events 

Most of the new queuing events are added to ±G. The only new queuing events that are added to 
non-G processes are the queuing events of acknowledgement packets that are sent in response to 
forwarded messages from ±G. This should not be a surprise because we deliberately construct i/'' 
in such a way that the additional packets emanating out of ±G are largely ignored and therefore 
have no side effects to speak of. 

We only create clone and zombie events for pre-critical queuing events in FI. We are going to add 
post-critical packets to but their queuing events will all be existing queuing events of existing 
multicasts. In fact many, but not all, of the pre-critical cloned packets will become parts of existing 
multicasts and will not require a new queuing event. 
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If e = is a pre-critical queuing event at D in H labeled A(e) = [^t-6 .6|/] we add the following 
new events in H^: 

1. If /c is a forwarded message packet, or an acknowledgement packet in response to any message 
packet other than a forwarded message from D 

(a) A clone queuing event Cc G Eg with label A(ec) = A(e) 

(b) A clone queuing event e_c G E_g with label A(e_c) = A(e) 

2. If A: G oi) is an acknowledgement packet in response to a forwarded message packet from D 

(a) Two clone queuing events G Eg with labels 

A(e^'^”’) = |ACK] 

A(eir'-”’) = |ACK] 

(b) Three clone queuing events Cc, efG Eg with labels 

A(ec) = A(e) 

A(ef'+”’) = |ACK] 

A(ef''”’) = [AT.O.fr^-^) |ACK] 

(c) Three clone queuing events e_c, G E_g with labels 

A(e_c) = A(e) 

A(e_^J^”’) = |ACK] 

A(e_^J'”’) = |ACK] 

3. If A: is a ghost packet and A(e) = [At-6.0|GHOST] 

(a) A clone queuing event Cc G Eg with label A(ec) = A(e) = [fr-O-OlGHOST] 

(b) A clone queuing event e_c G E_g with label A(e_c) = A(e) = [I't-O.OIGHOST] 

(c) A zombie queuing event Cz G Eg with label A(ez) = [£t-6 .6|FLUSH] 

(d) A zombie queuing event e_z G E_g with label A(e_z) = [^t-6 .6|FLUSH] 

4. If k is an original message packet or a flush packet, do not create any clones 

If e = A;'^^ is a pre-critical queuing event in H at P ^ D with label A(e) = [^t- 0-0|AGK] and if 
k G pi is an acknowledgement packet, we add the following new events in H^: 

1. If A: is an acknowledgement packet in response to a forwarded message packet 

(a) A clone queuing event G Ep with label A(e'l'’*^”') = |AGK] 

(b) A clone queuing event G Ep with label A(e^^ |ACK] 
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2. If k is an acknowledgement packet in response to an original message packet, do not create 
any clones 


5.4.4 Constructing the channels PQ for non-G processes P and Q 

The original H channels are copied to almost without change. The only change is that each 
channel is reduced (see Definition. This means that the content cont(fc) of each packet k in the 
channel is replaced with the reduced content cont(A:)’'. 


5.4.5 Constructing the channels and P{-g] for a non-G process P 

We start with an empty channel for P{-g] and with the reduction of the original channel P^^ for 
pd and remove the critical donation packet and its events, if they exist. Then we add the following 
new packets and events: 

For any timely packet k G pi> 

If k is a ghost, flush or message packet 

1 . A clone packet c G pd together with = k'^'^ and a new event labeled as 
A(c^^) = X{k^^) 

2. A clone packet c' G P{-g] together with = k'^^ and a new events labeled 
as A(c'^'^) = X{k^^) 

If k is an acknowledgement packet in response to a forwarded message 

1. A clone packet c G together with ' and a new event labeled 

as A(c'’'^) = A(A:^'^) 

2. A clone packet c' G P{-g] together with ' and a new event 

labeled as A(c'^'^) = A(fc’’'^) 

If k is an acknowledgement packet in response to an original message do not create any clones. 
For any untimely packet k G Pi!) 

If k is a ghost, flush or message packet 

1. A clone packet c G P(3 together with 

(a) 

(b) If Crit(P G) exists, a new event labeled as 

A(c^^) = [Crit(P ^ G).A:«''.0|0] 

2. A clone packet c' G P(-Gj together with = k‘^^ but no event 
If k is an acknowledgement packet in response to a forwarded message 
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1. A clone packet c € PG together with 

(a) 

(b) If Crit(P —>■ G) exists, a new event labeled as 

A(c^^) = [Crit(P ^ G).k^^.6ld] 

2. A clone packet c' € P(-G^ together with c'^'^ = but no event 

If k is an acknowledgement packet in response to an original message do not create any clones. 


5.4.6 Constructing the channels Gi^ and {-G)P for a non-G process P 


We start with the empty channel for (-G)P and with the reduction of the original channel for 

Gi^ and remove the critical co-donation packet and its related events. Then we add the following 
new packets and events: 

For any timely packet k € with e = k'^^ and X{k^^) = [t'7’'.0.6|6] 

If k is a forwarded message packet 

1. A clone packet c G Gi^ together with c'^’^ = Cc and a new c^'^ event labeled as 
A(c^'^) = [fT'-6.ir^+“^|6] 


2. A clone packet -c G (-G)P together with = e-c and a new event labeled as 
A(-c^'‘) = [£T'.o.tr(-“)|o] 

If k is a ghost packet 

1. A clone packet c G Gi^ together with c'^’^ = Cc and a new c^'^ event labeled as 
A(c^'^) = [£t'.6.J|(+s)|6] 

2. A zombie packet z G Gi^ together with z'^'^ = Cz and a new z^'^ event labeled as 


3. A clone packet -c G (-G)P together with = e_c and a new -c^'^ event labeled as 
A(-c^^) = [£t'.0.^(-s)|0] 


4. A zombie packet -z G (-G)P together with -z'^^ = e_z and a new -z^'^ event labeled 
as A(-z^'^) = 

If k is an acknowledgement packet 

1. A clone packet c G Gi^ together with c‘^’^ = Cc and a new c^*^ event labeled as 
A(c^^) = [£t'. 6 .^(+")| 6 ] 


2. A clone packet -c G (-G)P together with = e_c and a new event labeled as 
A(-c^'^) = [£t'.0.J|(-’^)|6] 

If k is a flush or an original message packet do not create any clones. 
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For any untimely packet k G Uf’ with e = k'^'^ 

If k is an acknowledgement packet or a forwarded message packet 

1. A clone packet c G together with c*^*^ = Cc and if Crit(G' P) exists, 
event labeled as A(c"’'^) = [Crit(G' —>■ P).A:‘^'^.0|6] 


a new c 


2. A clone packet -c G (-G)P together a -c'^^ = e_c and no -c^'^ event 
If k is a ghost packet 

1. A clone packet c G together with c*^^ = Cc and if Crit(G —>■ P) exists, 
event labeled as A(c'’^) = [Crit(G —>■ P).A:'^'^.JJ.(+s)|6] 

2. A zombie packet z G Gi^ together with z'^^ = and if Crit(G —>■ P) exists, a z'’^ 
event labeled as A(z'’^) = [Crit(G —>■ P).A:'^'^.JJ.(+^)|6] 


a new c 


3. A clone packet -c G (-G)P together with -c'^^ = e_c and no -c^'^ event 


4. A zombie packet -z G (-G)P together with = e_z and no -z'’^ event 

If k is a flush or an original message packet do not create any clones. 

For any post-critical packet k G G(3 with X(k'^^) = where viewlT) < r(P) and, if 

Grit(P ^ G) exists, £t ^ Grit(P ^ G) 

If k is a ghost, flush or message packet, a clone packet c G Gi^ with c'^'^ = and if 
Grit(G —>■ P) exists, a new event labeled as A(c'’^) = [Grit(G —>■ P).A:‘^'^.6|6] 

If k is an acknowledgement, donation or co-donation packet, do not create any clones. 


5.4.7 Constructing the channels (±G)(±G) 

We start with the empty channels for the three channels involving -G and with the reduction of the 
original channel G(3^ for G(3. Then we add the following new packets and events (notice that by 
the Self Ghannel Axiom there are no untimely packets in this case): 

For any timely packet k G with e = k'^^ and A(fc^'^) = [fT'-6.6|6] 

If k is a forwarded message packet 

1. A clone packet c G G(3 together with c‘^’^ = Cc and a new c^*^ event labeled as 
A(c^'^) = [£T'-6.ir(+“^|6] 

2. A clone packet c' G G(-G^ together with = Cc and a new c'^'^ event labeled as 
A(c'’’^) = [fT'.6.tr(+“^|6] 


3. A clone packet -c G (-G)G together with = e_c and a new -c'’^ event labeled as 
A(-C^'^) = [fT'.0.ir(““)|0] 


4. A clone packet -c' G (-G)(-G) together with -c'‘^’^ = e_c and a new -c'^'^ event labeled 
as A(-c'^'^) = [fT'.0.^(-“)|6] 
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If k is an acknowledgement packet for a forwarded message 

1. A clone packet c € G(3 together with c'^'^ = and a new c’^'^ event labeled as 

A(c^'^) = [£t'-64(+")|0] 

2. A clone packet c' € G(-G^ together with c''^'^ = ej* and a new event labeled 
as A(c''’^) = [^t'. 04 (+")| 0 ] 

3. A clone packet -c G (-G)G together with -c'^'^ = ' and a new -c^'^ event labeled 

as A(-c^'^) = [£t'. 04 (-’*)| 6 ] 

4. A clone packet -c' G (-G)(-G^ together with -c''^'^ = and a new -c'^'^ event 

labeled as A(-c''’^) = 

If k is a ghost packet 

1. A clone packet c G G(3 together with c*^^ = Cc and a new c^'^ event labeled as 
A(c^'^) = [£t'.64(+s)|6] 

2. A zombie packet z G G(3 together with z'^'^ = Cz and a new event labeled as 
A(z^'^) = [£T'. 64 (+f)| 0 ] 


3. A clone packet c' G G(-G) together with c'^^^ = Cc and a new c'^*^ event labeled as 
A(c'’’^) = [£t'.04(+s)|0] 


4. A zombie packet t! € G'(-G) together with = Cz and a new event labeled 
as A(z''’^) = [£T'. 04 (+f)| 0 ] 


5. A clone packet -c G (-G)G together with = e_c and a new -c’’^ event labeled as 
A(-c^^) = [£t'. 04 ^“®^| 0 ] 


6. A zombie packet -z G (-G)G together with = e_z and a new -tF^ event labeled 
as A(-z^'^) = [£T'. 04 (-f)| 6 ] 


7. A clone packet -c! € (-G)(-G) together with -c'^^ = e_c and a new -c'^^ event labeled 
as A(-c''’^) = [£t'. 04 (-s)| 6 ] 


8. A zombie packet -tI G (-G)(-G) together with -z'^^ = e_z and a new event 
labeled as A(-z''’^) = [£t'- 64 ^’^^| 0 ] 


5.4.8 Constructing the notification events 

The notification events and their labels are mostly left unchanged from H. We start out with 
and then apply the following additions and substitutions: 

We add notifications vo(G) and wo(-G) to F^ with 

cont(vo(G)) — riSTART (G) 
cont(?;o(-G)) — riSTART (-G) 


102 



For any view 0 < i < Vcrit we add notifications Vi{G) and Vi(-G) to with 

cont(vi(G)) = cont(t'i(-G)) = cont(vi(-D)) 

For any non-G process P we replace the contents of the critical notification 

cont(?;v„,t(^’)) = njoiN(G,£>) 

with 

cont(vv„,t(^)) = nREM(-G) 

We replace the contents of the critical notification 

cont(vv„it(G)) = nsTART(G) 

with 

cont(vv„it(G)) = nREM(-G) 

We add a notification (-G) to with 

cont(vv„it(-G)) = nsTOP 

For any view 0 < i < Vcrit we add notification events Vi{Gy^ and r!i(-G)'’^ to Eg and E_g 
respectively, labeled as X{vi{G)^^) = X{vi{-G)^^) = [£i. 0 . 0 | 6 ] 

5.4.9 The partial order and dropped items 

We now construct the order ^ in H''. We will occasionally denote it by -<^ . As we mentioned 
before, we use the partial order on labels in the construction. 

The relation is defined as the transitive closure of the following primitive order relations: 

1. For each k € : 

^QU ^PR 

whenever the dequeuing event exists. 

2. For each process X G and any two events ei, 62 G E;^ : 

jjfr 

ei 62 

if A(ei) < A(e 2 ) 

3. For any parent/child pair E/ J in H, other than the critical pair D/G: 


103 



Please note that this definition produces a well defined relation , but it will require some work 
to show that it is actually a weak partial order. We will pursue that investigation a little later. 

We need to define for each new packet whether it is sent and received, and for each new notification 
whether it is dropped. We declare that there are no dropped notifications in H''. As for packets, 
the packets for which we added a dequeuing event must be received, and therefore must be sent. 
But there are packets for which we did not add a dequeuing event. Since we do not need to be 
transactional or even lossless, but only conforming, we simply declare that any clone or zombie for 
which there is no dequeuing event is dropped, meaning that it is sent but not received. 


5.4.10 Implementing the user application interface 

To complete the construction of we have to insulate the user application from any knowledge of 
the fact that group history has changed. In order to do that we have to change the implementation 
of the user application interface lsee l2.d.!I)l . essentially creating a thin wrapper around it that hides 
the effects of the history changes. As part of the wrapper we add two new variables, uLview and 
uLcount, to the Replicated Data structure, and we make use of three ’’magic numbers” that are 
defined below. The new variables count how many views were installed and how many messages 
were delivered so far, in order to discover the boundary between the pre-critical and post-critical 
time that would otherwise be invisible to the user application. 

Definition 24 . MAGIC_VIEW is the value of cur_view at D in H at the critical time. 
MAGIC_MSG is the number of messages that are delivered at D in H by the critical time. 
MAGIC_SIZE is the number of members of view zero in H. 

We define two different implementations of the interface in H^, one for ±G and one for all other 
processes. As usual, we will use P to denote any process which is not ±G. We use the suffixes 
”@P” and ”@G” to differentiate between the two implementations. We also use the marker to 
indicate the iJ” version of an up-call. 


Procedure GroundState”@P 

let uLwew = - MAGIG_SIZE - 2; 
let uLcount — 0; 

GroundState@P(); // call the original H version 


Procedure GroundState’”@G 

let uLview = - MAGIG_SIZE - 2; 
let uLcount — 0; 

GroundState@D(); // call the original H version at process D 


Procedure ApplyMessage''@P(msg, originator) 
ApplyMessage@P(msg, originator); 
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Procedure ApplyMessage'’@G(msg, originator) 

increment uLcount; 
if uLcount < MAGIC_MSG then 

ApplyMessage(§)D(msg, originator); // call the original D version 
end 
else 

ApplyMessage@G(msg, originator); // call the original G version 
end 


Procedure ApplyJoin^@P(pid) 

increment uLview; 
if uLview ^ 0 and uLview ^ —1 then 
Apply J oin@P (p id); 

end 


Procedure ApplyJoin^@G(pid) 

increment uLview; 
if uLview > MAGIC_VIEW then 
ApplyJoin@G(pid); 
end 

else if uLview^ 0 and uLview^ —1 then 
Apply J oin@D (p id); 

end 


Procedure ApplyRemovar@P(pid) 

increment uLview; 
if ui_view = Vcrit then 

ApplyJoin@G(G); // pid is equal to -G 
end 
else 

Apply Removal® P (pid); 

end 
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Procedure ApplyRemovaF@G(pid) 

increment uLview, 
if uLview < MAGIC_VIEW then 
ApplyRemoval@D (p i d); 
end 

else if uLview = Vcut then 

ApplyJoin@G(G); // pid is equal to -G 
end 
else 

Apply Removal® G (p i d); 

end 


5.5 Basic Properties Of 


Now that we have defined the reduced history , we establish its basic properties. We show that 
H'' is a history according to Definition [1] and indeed a conforming history according to Definition 
[T] Only after we do that can we show that is the history of a valid CBCAST execution and that 
H'' carries the same computation as H. 

Definition 25. For any packet k G define 


origin(A:) = 


if k is the reduction of an original packet 
if k is a clone or zombie of the original packet j 


Theorem 5. is a history. 

We start with a lemma about the relation 

Lemma 21. If e and f are packet events in H'~ and e -< f then A(e) < A(/) 


Proof. The relation ^ on the packet events in is the transitive closure of the three primitive 
relations defined in 15.4.91 If we show that each of the primitive relations is compatible with the 
label order then we are done. 

The second primitive relation is compatible with label order by construction, and the third primitive 
relation is compatible because both notification events share the same label [fj(j).0.0|0] so the only 
difficulty is showing that for every packet k' € for which k'’^’^ exists, A{fc''^'^) < A(fc'’’’^). 

If the packet k' is an original H packet, then this property follows from Theorem 01 We only need 
to verify this property for packets k' that are either clones or zombies of an original packet k. 
Looking through the construction of packets, one can easily observe that one of the following 
three possibilities must occur: 

• fc is a timely packet with X{k‘^^) = [At-*-*!*] and X{k^^) = where It <It'- In that 

case for any clone or zombie fc' of fc, A(fc'‘^'^) = and A(fc'^'*') = .=i:.=i:|*]. Therefore 

X[k"^^) < A(fc'^^). 

• fc is an untimely packet with X{k^^) = [It-*-*\*\ where lTFlv„if In that case A(fc'‘^“) = 

and if k'^^ exists then A(fc''’^) = where It' = Grit(P —>■ G) or £t' = 
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Crit(G —)■ P) for some original process P. In either case £t' h£v„it therefore X{k'^'^) < 

• k is packet in G(3 with A(fc‘^“) = [I't-*-*!*] where ^vcrit ^ Crit(P —>■ G) for some original 
process P. In that case X^k'^'^) = [fT-*-*!*] and X{k'^^) = [Crit(G P).A:‘^'^.(I|0]. If c is the 
co-donation packet sent from G to P in the original history H then 

< [Crit(P ^ G).6.6|6] < [Crit(P ^ G).6.6|CODONATE] = A(c^") < 

< A(c"''^) = [Crit(G ^ P).0.6|6] < [Crit(G ^ P)P«''.0|0] = A(/fc'^'') 

This concludes the proof. □ 

Corollary 8. Let e, / G Ep be any two events at a proeess P in . Then e ^ f if and only if 
He) < Xif). 

Proof. Lemma EH proves one direction. The other direction follows immediately from the definition 
of in mil □ 

Lemma 22. Let e G E^ be the queuing event of a message, ghost or flush packet k and let Tg be its 
target set (refer to the Proeess Order Axiom for the definition of target set). Let eo = origin(fc)'^'^. 
Then 

1. if eo is a pre-critieal event that oceurred at a process R then 

Tg = ContactSetfi@eg U {±G} 

2. if eo is a post-critical event that occurred at G then 

Tg = LiveSeta@eo 

3. if eo is a post-eritical event that occurred at R ^ G then 

Tg = ContactSetpi@g„ 

Proof. First take a quick look at the pseudo code to verify that every multicast of message, ghost 
and flush packets has a target set that is equal to ContactSet. 

The claims follow from a careful examination of the construction of events in . 

Let R € and R ^ G. Post-critically we do not add any new clones or zombies to packets 
emanating from R so in this case origin(fc) = k and Tg is the same as the original target set, which 
proves the third part. 

Pre-critically we do add clones and zombies to message, ghost and flush packets emanating from 
R. li R ^ D then all of these clones (there are no zombies) also emanate from R. li R = D then 
some of the clones and zombies emanate from ±G. 

Every pre-critical multicast set of message, ghost or flush packets in R contains exactly one packet 
that is targeted at D. This packet gets cloned with two new targets ±G and added to the same 
multicast set. This proves the first part in this case when k emanates from R. 
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R = D then for every forwarded message multicast there is a cloned multicast emanating from 
±G that contains the complete original target set with the addition of ±G. Original message and 
flush multicasts do not get cloned to ±G. Each ghost multicast is cloned and zombied to two 
consecutive multicasts, both of which have the required target set. This takes care of the first part. 

As for G, to every post-critical multicast we add packets destined to the uncontacted processes. It 
follows from Lemma |S1I5|) that this makes the target set equal to LiveSet. □ 


The following is a technical lemma that is required for proving that the channels in LG are FIFO. 
Lemma 23. Let X and Y be processes in H'^ and let ki,k 2 € XY^ he distinct pre-critical packets 
such that ki'^^ -< k 2 ^^■ Then origin(fci)‘^^ ^ origin(fc 2 )'^'^, with equality occurring exactly when 
origin(fci) = origin(fc 2 ) is a pre-critical ghost packet and X = ±G. 


Proof. If both ki and k 2 are original, there is nothing to prove. The case where one of them is 
original and one is a clone or zombie is not possible because all the pre-critical clones and zombies 
belong to channels that involve ±G while none of the original pre-critical packets do. Therefore we 
can assume that both ki and ^2 are either clones or zombies of their original packets. 


Let = [iTi-ai-bild]. By Corollaryj^we know that < A(fc 2 ‘^“) and therefore trig(ri) ^ 

trig(T 2 ) . By going through the different cases of 15 . 4. 51 [ 5 . 4.61 and 15 . 4.71 it is easy to verify that for 
any pre-critical packet k with X{k'^'^) = [iT-a.b\c], the event origin(/c)‘^'^ has the same constellation 
label It- We also know that that for any event e in H the constellation label is -^trans(e) lsee l5.3.5ll . 
Therefore trig(Ti) = trig(origin(A;i)‘^'^). 


If trig(Ti) ^ trig(T2) then by Definition [11] origin(fci)‘^'^ ^ origin(fc 2 )'^'^ and we are done. Otherwise 
Ti = T2 so origin(fci) and origin(fc 2 ) are queued in the same transaction. Denote T = Ti = T2. We 
also know that origin(fci) and origin(fc 2 ) belong to the same channel. This channel is obtained from 
X^ by replacing any occurrence of ±G with D. 


Going over the complete list of different possibilities of transaction content for T we have: 


• T comprises a trigger plus one or more multicasts of original or forwarded message packets. 
This happens if T is an message broadcast request transaction, or if T is a notification 
transaction that resulted in the forwarding of messages out of FwdQueue[P], or if T is a flush 
packet transaction that caused the installation of a new view and the broadcasting of one or 
more messages out of LaunchQueue. In these cases origin(fci) are both pre-critical message 
packets. Going over the different cases in 15.4.51 - 15.4.71 shows that in this case \{ki^^) = 
A(origin(fci)‘^’^) and the lemma follows. 

• T comprises a trigger and the queuing of exactly one acknowledgement packet k. This happens 
if T is triggered by the processing of a message packet. In this case it must be that origin(fci) = 
origin(A; 2 ) = k. But an examination of 15.4.51 - 15.4.71 shows that distinct clones of pre-critical 
acknowledgement packets belong to different channels, contrary to the assumption that fci 
and k 2 belong to the same channel. So this case does not occur. 


• T comprises a trigger followed by a single multicast of ghost packets. This happens if T 
is triggered by a the processing of an acknowledgement packet that cleared out FwdWaitSet 
while BcastWaitSet remained non-empty, or if T is triggered by a nj^EmiP) notification event 
that occurred when FwdQueue[P] and FwdWaitSet were empty while BcastWaitSet was not 
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empty. In this case there is a single packet at each relevant channel, and therefore origin(fci) = 
origin(A: 2 ). An examination of 15. 4. 51 - 15. 4. 71 shows that the clones and zombies of a pre-critical 
ghost packet reside on the same channel exactly when the original packet is sent out of the 
process D, in which case it generates one clone and one zombie on each relevant channel. In 
this case X = ±G and the labeling forces the clone to precede the zombie, and therefore in 
this case fci is the clone and ^2 is the zombie of the same original packet. 

• T comprises a trigger followed by a single multicast of flush packets. This happens if T is 
triggered by the processing of an acknowledgement packet that cleared out BcastWaitSet while 
FwdWaitSet was already empty. In this case there is a single packet at each relevant channel 
and therefore origin(fci) = origin(fc 2 ). An examination of 15.4.51 - 15.4.71 shows that distinct 
clones of a pre-critical flush packet reside on different channels, contrary to our assumption 
that ki and ^2 reside on the same channel. So this case does not occur. 

• T comprises a trigger followed by a single multicast of ghost packets, followed by a single 

multicast of flush packets. This happens if T is triggered by the processing of an acknowl¬ 
edgement packet that cleared out FwdWaitSet while BcastWaitSet was already empty, or if 
T is triggered by the processing of a notification that occurred when FwdQueue[P], 

FwdWaitSet and BcastWaitSet were all empty. In this case there is in each relevant channel 
a single ghost packet followed by a single flush packet. There are two possibilities here. If T 
occurs in process D (in which case X = ±G) then the flush packet generates no clones and 
the ghost packet is the original of both ki and /c 2 which are a clone and zombie, respectively, 
of their shared original. If X ^ ±G then the ghost packet and the flush packet each generate 
a single clone (but no zombie) on each relevant packet, where the labeling forces the ghost 
clone to precede the flush clone. Therefore ki is the ghost clone and k 2 is the flush clone, and 
the clones are sent in the same order as their originals. 

• T comprises a trigger and no other event. This happens if T is triggered by the processing of a 
ghost packet, or if T is triggered by a flush packet that did not cause the installation of a new 
view, or if T is triggered by a flush packet that did cause the installation of a new view while 
LaunchQueue was empty, or if T was triggered by the processing of an acknowledgement packet 
such that FwdWaitSet was not empty when processing started and remained non-empty when 
processing concluded, or if T was triggered by an acknowledgement packet for an original 
message packet such that FwdWaitSet was empty and BcastWaitSet was not empty when 
processing started, and BcastWaitSet remained non-empty when processing concluded, or if 
T is triggered by the processing of a nREM(T’) notification that occurred when FwdQueue[P] 
was empty while FwdWaitSet was not empty. This case does not occur, of course, since we 
know that are least origin(fci) must have been queued to be sent as part of T. 

□ 

Proof of Theorem [31 Some of the axioms are easy to check. The View Interval Axiom and the 
View Change Axiom are trivial. The Channel Axiom and the Packet Event Axiom are true by 
construction. The Notification Event Axiom and the Parent Axiom are trivial. The GMS Axiom 
follows directly from the construction of notifications in W*'. 

The Notification Order Axiom says that view notifications are processed in order. This property 
in induced directly from H for all processes other than ±G. In the case of -G the axiom follows 
because we explicitly added a processing event for each notification except for the last critical one. 
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In the case of G, all the post-critical notification processing events (including the critical one) exist 
because H is transactional, while all the pre-critical ones are added explicitly. The processing order 
is induced directly from the labels of the events. 

The Process Order Axiom has two parts. The first part claims that events within a single process 
are linearly ordered. The second part claims that multicast sets are finite. To prove the first part, 
notice that at each process P the constellation coordinate is made up of clean events at process 
P in H, and this set is linearly ordered since the Process Order Axiom holds in H. Within each 
constellation the other coordinates are also taken from linearly ordered sets: this is trivial for the 
adjustment and side effect coordinates since the sets £a and £f are linearly ordered. As for the 
sub-transaction coordinate, we have the following cases: 

• Non-critical donation and co-donation constellations have the same events in as they do 
in H, and have the same labels. Since the labels are linearly ordered in H, they are linearly 
ordered in . 

• In a critical donation constellation, the side-effect events in H'' consist of the sub-transaction 
events from H (the co-donation queuing event is removed) and have additional trigger events. 

All of these events get their sub-transaction coordinates from the space G E \\k G 

, where P is the sender of the donation packet. This space is linearly ordered by the 
Process Order Axiom in H. 

• In a critical co-donation constellation, the side-effect events in consist of all the side- 
effect events in H with the addition of new trigger events. All of these events get their 
sub-transaction coordinates from one of two sets: 

Cl = G II k G Df>^} 

C2 = G E^ II k G 

Each of these sets is linearly ordered by the Process Order Axiom in H. Taken together they 
are still linearly ordered because all the elements in Ci precede £vcrit all the elements in 
C2 succeed it. 

• In a non-donation constellation, all the events have the 0 value for the sub-transaction coor¬ 
dinate. 

It follows that the A function linearly orders Ep at each process P. By Corollary [5] the first part 
of the axiom now follows. 

The second part of the axiom follows immediately from Lemma 1221 

To verify the Process Liveness Axiom and the Piggyback Axiom, notice that they trivially hold 
for packets exchanged between non-G processes. As for original packets exchanged between G and 
non-G processes, almost nothing changes except that from the point of view of the non-G processes 
in process G had been a member from the start. This does not invalidate either axiom. 

The clones and zombies of timely packets are either queued at ±G or dequeued at ±G at essentially 
the same time that the original is queued or dequeued at D (any differences in the label of the 
original versus the clone or zombie occur at the second or third coordinate). The same is true of 
untimely clones and zombies as far as queuing time is concerned. As for processing time, the first 


no 


label coordinate for the processing event of untimely clones or zombies, when such an event exists, 
is either Crit(P G) or Crit(G' —>■ P) for some non-G process P. Since this label corresponds to 
the processing transaction of a P-donation at G or a G-codonation at P in the original history, 
the axioms follow in this case from the same axioms in the original history as they pertain to the 
donation or co-donation packet. 

As for clones of a post-critical packet k G G(3, such a clone is created only if the original packet 
is queued while P appears alive to G, and its processing event label has Crit(G —?• P) as a first 
coordinate, which verifies the axiom using the same argument as before. 

The Self Channel Axiom holds trivially for original packets k. The only cloned packets on self 
channels exist on ^ and -G-d. If such a cloned packet is pre-critical, then the original Self 
Channel Axiom guarantees that it is the clone of a timely packet and the axiom follows easily from 
that in that case. Since neither of the two channels contains clones of post-critical packets, we are 
done. 

The Request Event Axiom is trivially induced from H. 

The Order Foundation Axiom follows from Corollary [8] as long as £ is itself very well founded. This 
in turn follows from the Order Foundation Axiom in H. 

The Minimal Order Axiom is true in H'' by definition Isee 15.4.51) . 

The First Halting Axiom would follow if we show that we only add a finite number of new events 
in H''. Indeed, all the timely and untimely packets that we add are clones and zombies of packets 
that are queued before the critical view change notification. It follows from the Order Foundation 
Axiom and the View Change Axiom (in H) that there is only a finite number of such packets. 

The post-critical packets that we add are all clones of post-critical packets in G(3 that are queued 
before G processes a donation packet from some original process P G or before G receives a 
removal notification for P. If G halts then there is a finite number of such packets by the First 
Halting Axiom (in P). If G does not halt but P is removed then G processes the removal notification 
of P (because H is transactional) and therefore queues only a finite number of packets before that 
event by the Order Foundation Axiom. 

If G does not halt and P is not removed then P does not halt (because H is conforming and 
not stunted, see Lemma E]). In that case P must dequeue and process to completion the join 
notification of G (because H is transactional) and therefore must queue a donation packet d to G. 
All the packets queued by P, including d, must be sent (by the Third Halting Axiom) and none of 
them are dropped (because there are no dropped packets in a transactional history). Therefore G 
receives all of them, and since G does not halt it must dequeue all of them, including d (by the Third 
Halting Axiom). Therefore G queues a finite number of packets prior to the d^^ = Crit(P —>■ G) 
event (by the Order Foundation Axiom) and we are done. 

The Second Halting Axiom carries over to for every non-G process. The first part of the axiom 
holds for ±G because there are no dropped notifications in . The second part holds for -G 
because it halts for for G because it processes all of its notifications. 

The Third Halting Axiom has three parts. The first two parts follow from the same axiom in H 
and the fact that each clone and zombie k which is not processed is dropped. The third part follows 
for the same reason, though not so trivially. Suppose that a packet k meets the criteria of the third 
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part of the axiom. Then k is received and therefore is not dropped. If fc is a clone or a zombie then 
k must be processed because all unprocessed clones and zombies are dropped. If it is original and 
all its predecessors are received, then surely its original predecessors are received, and therefore in 
H the packet k is processed, and therefore it is processed in as well. 

The Fourth Halting Axiom is trivially induced from H. 

The main challenge is verifying the last remaining axiom, the Packet Order Axiom, which says that 
the channels are FIFO. This is crucial but unfortunately quite tedious. Our basic approach is the 
following. We take two packets ki and ^2 that meet the assumptions of the axiom. Namely, they 
belong to the same channel XY^ , with ^ k 2 '^'^, and with k 2 ^^ existing. To verify the axiom 
we have to show that ki^^ exists and precedes To do that we look at the original packets 

origin(fci) and origin(A: 2 ) and use the FIFO property in H to glean information about their events 
and labels. From there we derive information about the labeling of the ki and k 2 events, which in 
turn determines their order in (see Corollary [S|) . 

We look at different cases according as ki and k 2 are pre-critical or post-critical. The condition 
ki^'^ -< ^ 2 '^'^ implies that if k 2 is pre-critical, then so is fci. Also notice that a packet k is pre-critical 
if and only if origin(fc) is pre-critical. 

Case I: ki and fe are both pre-critical 

In this case we can use Lemma m and conclude that origin(A:i)‘^“ ^ origin(A: 2 )'^^. According to 
the same lemma, if there is an equality then origin(A:i) = origin(A: 2 ), the common origin is a ghost 
packet and X = ±G. In this case a routine inspection of the different cases of 15.4.51 - 15.4.71 shows 
that ki^^ exists whenever k 2 ^^ does, and precedes the latter as well. So we can assume that 
origin -< origin(fc 2 )'^'^ and therefore origin(A:i) and origin(A: 2 ) are distinct. 

If origin(A: 2 ) is timely then the Packet Order Axiom for H implies that origin(fci)^'*^ exists and 
moreover origin(A:i) -< origin(fc 2 ) • This in turn implies that origin(A:i) is a timely packet and 
therefore fci is timely as well and therefore ki^^ exists. Let be the transaction triggered by 
origin(fci)'’'^. Then trig(ri) ^ trig(T 2 ) and so £ti 

Going over l5.4.Sl - l5.4.7l one can verify that if fc is a timely clone or zombie packet then shares 

the same constellation coordinate with A(origin(A:) ). Therefore the constellation coordinate of 
ki^^ is ixi and it follows from Corollary [5] that ki’^^ -< k 2 ^^ and we are done. 

We are left with the case where origin(fc 2 ) is untimely. We know that k 2 ^^ exists. Going over 
the untimely cases of 15.4.51 - 15.4.71 one can verify that X{k 2 ^^) = [£T-origin(fc2)'^'^.*|0] where £t = 
Crit(P —>■ G) or It = Crit(G ^ P). In particular we know that T exists. In this case if origin(A:i) 
is timely then exists and is timely and therefore precedes the untimely k 2 ^^■ If origin(A:i) is 
untimely, it follows from the existence of T that ki^^ also exists and has a label [£T-origin(fci)‘^'^.*|0]. 
This implies that < A(fc 2 ^^) and therefore ki^^ -< k 2 ^^. 

Case II: ki is pre-critical and k 2 is post-critical 

The case where ki is timely is trivial, because for a timely packet ki^^ exists by definition and 
again by definition ki^^ < ^vcrit ^ k 2 ^^. So we can assume that ki is untimely. 


Since we do not add any new packets to non-G channels, and since any channel involving -G does 
not contain any post-critical packets, we can confine our attention to the channels Gi^, pS and 
where P is an non-G process. 
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Look at the channel pd. We do not add any new packets to this channel that are post-critical. 
Therefore ^2 must be original. Therefore we can assume that ki is not original (otherwise there 
is nothing to prove). We know that ^ 2 ^*^ exists. Denote by T the transaction that is triggered by 

We know that the first post-critical packet (in H) that P queues to is the donation packet 
d, and since P queues some post-critical packet to G in iJ (namely k 2 ), it must also queue d. 
Moreover, ^2 ^ d, since ^2 survives in iJ’’ and d does not (see l5.4.5]) . Therefore d^'^ -< k 2 ^^ and the 
Packet Order Axiom in H implies that d^^ exists and precedes k 2 ^^ in H. Therefore Crit(P —>■ G) 
exists and Crit(P G) ^£t- Since ki is a clone or a zombie, the existence of Crit(P —>■ G) implies 
that ki^^ exists and A(A:i^'^) = [Crit(P —>■ G).=i=.*|0]. On the other hand A(fc 2 '’^) = and 

Crit(P —>• G) A<£t- Therefore A(A:i'’^) < A(fc 2 ’’^) and therefore ki^^ -< k 2 ^^ and we are done. 

Look at the channel Since does not have pre-critical packets, ki cannot be original. 

We assume first that fc 2 is original. We proceed just like in the case of P(3. We know that the 
first post-critical packet (in H) that G queues to Gi^ is the co-donation packet c, and since G 
queues some post-critical packet to Gi^ in H (namely, ^ 2 ), it must also queue c. Moreover, k 2 ^ c, 
since k 2 survives in iJ’' and c does not (see 15.4.51) . Therefore ^ ^ 2 '^'^ and the Packet Order 
Axiom in H implies that exists and precedes k 2 ^^ in H. Therefore Crit(G —>■ P) exists and 
Crit(G ^ P) ^It, where T is the transaction triggered by k 2 ^^- Since fci is a clone or a zombie, 
the existence of Crit(G —>■ P) implies that exists and A(fci'’’^) = [Crit(G P).=i=.0|0]=i=. On the 
other hand A(fc 2 '’^) = and Crit(G —>■ P) -^£t- Therefore A(fci'’^) < A(fc 2 '’^) and therefore 

^ k 2 ^^ and we are done. 

If ^2 is not original then origin(fc 2 ) G G(3 (this is the only channel that produces post-critical 
clones). By assumption k 2 ^^ exists, so bv 15.4.61 CritfG —>■ P) exists and 

A(fc 2 '’^) = [Crit(G —>■ P).origin(fc 2 )'^’^. 6 | 6 ] 

In particular the co-donation packet c € Gi^ exists, and is processed, in H. 


ki is an untimely clone or a zombie. The 15.4.^ construction shows that when Crit(G —>■ P) exists, 
exists and A(fci’’^) = [Crit(G —>■ P).origin(fci)‘^’^.0|0]. In the clean event order on E we have 

origin(fci)‘^'“ A4„it A origin(A: 2 )^'^ 

Therefore A(fci’’^) < X{k 2 ^^) and therefore ki^^ -< ki^^ and we are done in this case as well. 

We are left with channel G(3. This channel does not contain untimely packets, so there is nothing 
to consider in this case. 

Case III: fci and ^2 are both post-critical 

If both fci and ^2 are original packets there is nothing to prove. The only post-critical clones 
(there are no zombies) occur in Gi^ channels where P is an non-G process. So we can assume that 
ki,k 2 S Gi^ and at least one of them is not original. 

If ^2 is not original then it is a clone of a post-critical packet G(3 with A (origin (^ 2 )'^’^) = If 

Crit(P —>■ G) exists then It A Crit(P G). Since k 2 ^^ exists it follows by 15.4.61 that Crit(G —>■ P) 
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must exist and X{k 2 ^^) = [Crit(G ^ P).origin(/c 2 )'^'^- 6 | 6 ]. This means in particular that the co¬ 
donation packet c from G to P exists, and is processed, in H. It also means, a-fortiori, that 
Crit(P G) exists. 

Suppose that ki is original. Since c is the first packet, in H, that is sent from G to P; and since 
fci ^ c (it survives in and c does not), we have ^ ki^^. Therefore 

A(c«") = [Crit(P ^ G).0.0|CODONATE] < A(fci'^'') 


On the other hand 

A(origin(/fc 2 )‘^") = [fT-*-6|*] < [Crit(P ^ G).6.0|CODONATE] = A(c«") 

Bv l5.4.61 A(fc 2 ^'^) = A(origin(A: 2 )'^^) and so X{k 2 ^^) < X{ki^^) and therefore ^ ki^^ contrary 
to our assumption. So this case is not possible. 

So we know that if k 2 is a clone then ki must be a clone as well, and since Crit(G P) exists we 
know bv l5.4.6l that exists and X{ki^^) = [Crit(G —J- P).origin(/ci)‘^'^.6|6]. Since origin(fci) and 
origin(fc 2 ) both belong to G(3 and since 

A(origin(A:i)‘=“) = X{ki^^) < A(fc 2 ‘^") = A(origin(A: 2 )‘^") 

we have origin(fci)‘^'^ ^ origin(fc 2 )'^'^ and so bv 15.3.21 Affci'^^1 < A(fc 2 ^'^) and so ^ k 2 ^^ and we 
are done. 

We are left with the case where k 2 is original and ki is not. Using the same argument that we 
used with an original ki, we can establish that c'^'^ ^ 2 '^'^, where c is the co-donation packet sent 

from G to P in H. We know by assumption that k 2 ^^ exists. So we can conclude from the FIFO 
property in H that exists and k 2 ^^■ In particular Crit(G P) exists. Let T be the 

transaction triggered by k 2 ^^- Then Crit(G ^ P) -< £t- 

Since ki is a clone and Crit(G ^ P) exists, we conclude from 15.4^ that ki^^ exists and 
X{ki^^) = [Crit(G ^ P).origin(/fci)‘^“.6|6 ] < [£t.*.*|6 ] = X{k 2 ^^) 

and therefore ^ k 2 ^^- This concludes the proof. □ 

Theorem 6. is a conforming history. 

Proof. We already know that IP^ is a history, but we still have to confirm the conforming axioms, 
using the fact that H is a conforming history. 

The Conforming Channel Axiom. 

If either P = (-G) or Q = (-G) we are done because -G halts and is removed in H^. All the 
other channels in are original H channels to which we added clones and zombies and from 
which we removed the critical donation and co-donation packets. Since the latter are finite 
in number, any original channel that is finite in is also finite in H, and it follows from the 
conformity of H that either P is removed in P or Q halts in H. These properties carry over 
directly to PP^. 
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The Conforming Packet Axiom. 

Suppose that this axiom is violated. Then there are processes X and Y and a packet k S XY 
such that 


• Vi{Y) = nREM{-’^) and exists. 

• exists and ^ 

Since we do not change the non-G channels, at least one of X and Y must be equal to ±G. 
If k is original then it does not violate the axiom. All the clones and zombies that we add 
to the (±G)(±Gj channels are timely, and we add only pre-critical packet processing events 
in channels where X = (-G). Therefore k must be a clone or zombie on a Gi^ channel or a 
P(±Gj channel. 

In the case of a Gi^ channel k must be untimely, and since it is processed the constellation 
label of k^^ is Crit(G —>■ P) and the critical co-donation packet is processed at P. It follows 
from the Conforming Packet Axiom in H that Crit(G —?■ P)AG(G) and we are done. 

The case of a P(±g) channel and an untimely k is similar. If k is timely then the constellation 
label of k^^ is equal to the clean event origin(fc) . By the Conforming Packet Axiom in H 
we know that origin(fc) -<£r(p) and we are done in this case as well. 

The Conforming Notification Axiom. 

By construction H’’ has no dropped notifications, so the axiom is vacuously true. 

The Conforming GMS Axiom. 

The new process -G has a finite view interval. If P 7 ^ (-G) and P halts then P exists in H 
and has a finite view interval there. This property carries over to . 

The Conforming Parent Axiom. 

The history H is transactional and therefore has no uninitialized processes. This property 
carries over to for all the processes with the exception of -G. But -G itself is also initialized 
because it processes all of its notifications. Therefore this property holds vacuously at P’’. 

The Conforming Halt Axiom. 

This property carries over to for all the processes in H. The property holds for -G as well 
because it halts. 

□ 


6 The History Equivalence Theorem 

6.1 Introduction 

We are now ready to prove the fundamental property of P*", namely, that it performs the same 
calculation as P. 

The proof proceeds by induction on the partially ordered constellations of £. We will formulate 
an inductive hypothesis that correlates the state of the processes in P and in prior to a given 
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constellation. We will show that under the hypothesis, if the processes of H'^ process the triggers of 
the constellation according to the CBCAST algorithm, each would generate and queue the exact same 
packets, and in the same order, that are observed in the H'^ transaction for that trigger. Moreover, 
the post-processing state of the processes in iJ’’ will continue to relate to the state of the processes 
in H according to the inductive hypothesis once the whole constellation is processed. 

The most important part of the rather elaborate inductive hypothesis is what it says about the 
eventual state of H and iJ’’, namely that the state becomes identical. This means that the cal¬ 
culation carried out by the two histories is the same calculation. This makes it possible to carry 
over desirable properties like coherence and progress from to H. By repeating that step we can 
ultimately carry over these properties from relatively simple histories that do not have any process 
joins to the more intractable histories that have any finite number of such joins. 

Theorem 7 (History Equivalence Theorem). is the history of a CBCAST and APP computation 
that performs the same computation that H does. Specifically 

• delivers the same messages and view installations in the same order and at the same 
constellation as H does at any process P ^ ±G. 

• delivers the same messages and view installations in the same order and at the same 
constellation as H does at process G after the critical moment. 

• delivers the same messages and view installations at processes ±G in the same order and 
at the same constellation as H does at process D before the critical moment. 

• At some point the states of and H become identical 

Where a message delivery is an invocation of the ApplyMessage up-call and a view installation is 
an invocation of the ApplyJoin or ApplyRemoval up-calls. 


6.2 Proof plan and preliminaries 

The claim of the theorem is a little subtle. H is a, history that arises naturally from an execution of 
CBCAST and APP. But is a synthetic history whose trigger events occur for no underlying reason. 
To prove the theorem we have to endow with an execution that gives rise to its arbitrary 
behavior. We do that inductively, one constellation at a time. 

At the beginning of time we have each original process in initialized by invoking the |protStart| 
call, using 

rosteH = roster {±G} 

as the roster. This causes the processes of to initialize to a specific initial state. We will show 
that this state is similar to the initial state of 77, where similarity of state is a rather complex 
relationship that we will define later. This similarity forms the basis of our inductive process. The 
induction is by the partially ordered constellations of £c (see Definition [21]) . 

Recall from 15.31 that H and 77’’ constellations were defined as sets of events that share the first 
coordinate, called the constellation coordinate, in their labels. Since each constellation coordinate is 
a clean 77-trigger, there is a very close relationship between 77’’-constellations and 77-constellations. 
In fact in most cases a 77’’-constellation is essentially an original 77-transaction or a set of clones 
of an 77-transaction. The exceptions are the critical donation and co-donation transactions of 
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H. Each of these gets broken down into a sequence of transactions which correspond to H 
sub-transactions and which can be distinguished by their sub-transaction labels. 

For each constellation we assume the following about H'^ and H : 

1. The starting states of H’^ and H are similar. 

2. The starting states of the APP thread in and H are identical. 

3. The next execution interval of the APP thread will occur at exactly the same time in H'^ and H 
at every process. In particular, since the constellation consists of at most a single transaction 
per process in H, the APP thread will not continue running in until the conclusion of the 
constellation, despite the fact that it may consist of multiple transactions. 

The last assumption reflects a degree of freedom that we have in weaving the APP computation 
into H'". After all, we only have to show that H'" can arise as a history of a CBCAST and APP 
computation, not that it must arise. 


The only difficulty with the timing of the APP thread occurs at its inception. In H, the APP thread 
at G is launched when G installs the critical view. In however the thread is launched at the 
beginning of time by the protStart procedure, since G is original in . However the launch is 
asynchronous, meaning that the thread is not executed immediately but at some indeterminate 
point in the future. Since the lauch is earlier in H'^ we can simply assume that the execution is 
delayed long enough to coincide with the execution in H. At process -G in H'^ the launch also 
occurs at the beginning of time but we can assume that the execution is delayed until -G halts and 
therefore never occurs. 


Under these assumptions we demonstrate the following conclusions: 

• Using the CBCAST callbacks to execute the constellation in iJ’' results in an ending state of 

that is similar to the state of H at the end of the same constellation. 

• The side effects that are generated by the CBCAST callbacks are identical to the observed side 
effects in H^. In other words the current constellation looks like it has come about as a result 
of a CBCAST execution rather than an arbitrary choice of side effects. 

• The message deliveries and view installations that are generated by the CBCAST callbacks in 

are identical to those generated in AI, in the sense that was elaborated in the statement 
of the theorem. As a result any information that is visible to APP remains identical at the end 
of the constellation. 


This looks like it is enough for carrying the induction forward, but it is not quite enough, because 
even though we have shown that the side effects of the current constellation are generated by CBCAST 
rather than being arbitrary, we have not shown that any subsequent trigger events are non-arbitrary. 

So suppose that C is a constellation in 7A’’, and suppose that every preceding constellation has 
been shown to arise from a CBCAST and APP execution. Why should this execution give rise to the 
triggers of C? Look at any trigger in G: 

If the trigger is a dequeuing of a notification then there is no problem, because CMS is part of 
and we are allowed to control its behavior arbitrarily. 
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If the trigger is a dequeuing event of a packet, then the packet was queued in a previous constellation 
and therefore was a result of a CBCAST and APP execution. The dequeuing of the packet at this point 
is the result of the timing (or labeling) that we built into explicitly in order to have packets 
dequeue at their destinations at improbable, but incredibly convenient times. 

If the trigger is a dequeuing of a message broadcast request then its existence depends on APP 
actually having produced the same requests in and in H at the same time. The only way for 
this to happen is for the APP thread to be exactly identical in both histories, and this can only be 
guaranteed by perfectly masking the differences between H'" and H from APP. For this to happen 
we need three conditions: 

• The APP thread must have been in an identical state in both histories at all processes at the 
end of the previous constellation. 

• The APP thread must have seen the exact same information since that time. 

• The APP thread execution must have proceeded at the exact same speed at the exact same 
intervals in both histories and must not have intermingled with constellation executions. 

We have shown that the first two conditions are met, and are at liberty to assume the third, as we 
have seen. Therefore we can conclude that APP could have issued the same request^ 

With these observations we can conclude that the triggers of C are indeed a result of a CBCAST and 
APP execution and that the three inductive hypotheses m, m and ([3]) above continue to hold. 

The definition of state similarity is somewhat complex and varies depending on the period in the 
life of the process into which the constellation falls. Each process goes through three periods, the 
pre-critical, interim and eonvergent periods. The pre-critical period includes all the transactions 
that occur prior to the critical notification and the convergent period includes all the transactions 
that occur after the critical view is installed. The interim period includes all the constellations that 
occur while the critical view is pending installation. 

Here is the crux of the matter: in the convergent period, similarity becomes equality, and the two 
histories converge as claimed. 

It turns out that desirable properties like Causal Order and Progress carry over from to H. By 
iterating the history reduction process we can ultimately carry over these properties from relatively 
simple histories that do not have any process joins to the more intractable histories that have any 
finite number of such joins. Later we will show that join-free histories enjoy the Causal Order 
Property and the Progress Property. As a result both properties hold for finite-join histories. We 
will show that the Causal Order property holds for histories with an infinite number of joins as 
well. This is not true for the Progress Property. 


6.3 Side Effects in 

In our model each side effect is a queuing event. A queuing event is a multicast (in the case of 
a message, ghost or flush packet) or a unicast (in the case of an acknowledgement, donation or 

^we could have simplified this argument by replacing APP in with a random oracle that by sheer luck broadcasts 
the same messages that APP issues. However we thought it was significant that the argument could be carried forward 
with the same user application and without resorting to an artificial oracle. 
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co-donation packet). In a queuing event a process P queues a set of identical packets to outbound 
channels, bound for a target set of processes (see the Packet Event Axiom). Before we can make 
inductive arguments, we must relate the observed side effects in to the observed side effects in 
H. 

As we prove the History Equivalence Theorem, we will repeatedly make the argument that the 
execution of CBCAST in iJ’’ produces the observed iJ’’ side effects. Each observed side effect in H'^ 
has two characteristics: the type and content of the packet that is being queued, and the target 
set of the multicast or unicast. In each case we will have to show that the CBCAST code execution 
produces the observed type of packet with the observed content. As for the target set, all the 
multicasts in CBCAST (step H] of |protBroadcast| step [HI of |protRemove[ steps [T] and of ICheckFlusti)) 
use ContactSet as the target set. We will establish later (see Lemma that in each case, the 
observed target set is equal to the value of ContactSet that exists at the process in H'' at the time 
of the multicast. Unicasts will not present a similar problem. 


6.4 The inductive hypothesis 

The inductive hypothesis relates the states of certain processes in H and AU. The complete set of 
variables that make up the state of a process is listed in l3.4.1l The inductive hypothesis is complex 
enough to warrant a preliminary discussion. 

Thanks to labeling we have a common ’’timeline” for H and iA”, namely the common constellation 
partial order. We divide this common timeline into three periods: The pre-critical period, the 
interim period and the convergence period. The pre-critical period is the interval of time up to the 
critical view change constellation ivcru- The interim period ends at a process when that process 
installs the critical view. This boundary occurs at a different constellation at each process. Process 
-G is removed at the end of the pre-critical period, so it does not have an interim period. The 
convergence period starts at the end of the interim period and continues indefinitely. The inductive 
hypothesis is divided into separate hypotheses for each time period. The most important of those 
is the convergent period, where the hypothesis is that H and iA” are identical. 

To summarize, let e S IE be a constellation and let P be a process. Then 

• the constellation belongs to the pre-critical period at P if e -< Avcit 

• the constellation belongs to the interim period at P if e ^ ^vcrit cur_viewp@e < Vcrit 

• the constellation belongs to the convergent period at P if e A Avcrit cur_viewp@e > Vcrit 

The most complex constellations are the critical donation and co-donation constellations, Crit(P —>■ 
G) and Crit(G —>■ P). Each of these is a single transaction in H, but becomes a sequence of 
transactions in iA” - potentially even an empty sequence, in which case the constellation does not 
exist in iA”. To prove the inductive hypothesis for one of these constellations, we need to resort 
to a second level of induction. For this purpose we will formulate sub-hypotheses that relate the 
states of AA and AA” at each sub-transaction. 

We build on our observations in to define the following equivalence between side effects. We 
use this equivalence to separate the issue of side effect type and content from the issue target set, 
that will be treated separately. 
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Definition 26. Let e = and he two queuing events in H and , respeetively. We 

say that e and e*" produee an equivalent side effect if k and k'^ have the same packet type and 
cont{k) = cont{k^) (see Definition \2^) . 

The inductive hypothesis is actually a set of related hypotheses and sub-hypotheses. The main 
hypotheses are: 

• The First Pre-Critical Hypothesis, which describes how the state of a process P S in H 
is related to the state of the same process in iJ*' at the start of a pre-critical constellation. 
Notice that P ±G because -G is not a process in H while G joins post-critically in H. 

• The Second Pre-Critical Hypothesis, which describes how the states of D, G and -G in 
are related to each other at the start of a pre-critical constellation. Notice that in this case 
the comparison is within H’^, not between P[ and P[^. 

• The Interim Non-G Hypothesis, which describes how the state of a process P ^ G in H is 
related to the state of the same process in at the start of a post-critical constellation that 
occurs before P installs the critical view in PI. 

• The Interim G Hypothesis, which describes how the state of G in H is related to its state in 

at the start of a post-critical constellation that occurs before G installs the critical view 
in H. 

• The Gonvergent Hypothesis, which claims that the state of a process P in i? is identical to 
its state in at any time after P installs the critical view in H. 

The sub-hypotheses are: 

• The Donation Sub-Hypothesis, which describes how the state of G in i? relates to its state in 

at the start of each sub-transaction of each donation constellation at G. 

• The First Go-Donation Sub-Hypothesis, which describes how the state ot P G in H re¬ 
lates to its state in H'' at the start of each untimely sub-transaction of each co-donation 
constellation in P. 

• The Second Co-Donation Sub-Hypothesis, which describes how the state of P 7 ^ G in 77 
relates to its state in 77’' at the start of each post-critical sub-transaction of each co-donation 
constellation in P. 

Inductive Hypothesis 1 (First Pre-Critical Hypothesis). Let C he any pre-critical constellation 
and let P he a process that exists in both 77 and H'' at the start of C. Then the state of P in 77 
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and the state of P in are identical at that point, with the following exceptions: 


MSef = MSet |J {±G} 

LiveSef = LiveSet {±G} 

ContactSef = ContactSet {±G} 

vf[] = vt[]\J{[G]=0,[-G]=0} 

ReceiveSef = ReceiveSet 

FwdQueue^l] = FwdQueueW {[G] = 0, [-G] = 0} 

WaitSef = {{msg, index, d{iset)) \ {msg, index, iset) G WaitSet} 

where d(iset) = < 

[iset U {/set[±G] = {/ = iset[D].f, b = 0}} 


D ^ iset 
D G iset 


mpktJP[X].f 

nnpktJn^[X].b 


mpktJn[X].f X ±G 
mpktJn[D].f X = ±G 

mpktJn[X].b X ^ ±G 
0 X = ±G 


ghosf[] = ghost[] (J {[G] = ghost[D], [-G] = ghost[D]} 
flush'll] = flushW IJ {[G] = ghost[D], [-G] = ghost[D]} 


Notice that fiusN[±G] is indeed inherited from ghost[D], not flush[D]! 

Inductive Hypothesis 2 (Second Pre-Gritical Hypothesis). Let G be any pre-critical constellation. 
Then the states of ±G in FI'' at the start of C are identical to the state of D in H'' at the same 
point, with the following exceptions: 


self{±G) = ±G 
BcastWaitSef {±G) = 0 

FwdWaitSef {±G) = {{msg, {/ = index, f, b = 0}, iset) \ 

{msg, index, iset) G FwdWaitSef (D)} 

LaunchQueue'^ {±G) = 0 
flush_heighf {±G) — ghosFheighf [D) 
mpkt_ouf .b{±G) = 0 

Inductive Hypothesis 3 (Interim Non-G Hypothesis). Let C be any post-critical constellation 
and let P ^ G be a process that exists at the start of G and has no yet installed the critical view in 
H. Then the state of P in H is identical to the state of P in H'' at that point with the following 
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exceptions: 


MSef ^ MSet |J {±G} 

PendViewQueue^ — PendViewQueue with (REMOVE,-G) replacing (JOIN, G) 
vf[] = vt[]\J{[G]=0,[-G]=0} 

ReceiveSef = ReceiveSet 
FwdQueue^W = FwdQueue[] 

WaitSef = WaitSet 

Inductive Hypothesis 4 (Interim G Hypothesis). Let G be any post-critical constellation and 
suppose that G exists at the start of G and has not yet installed the critical view in FI. Then the 
state of G in FI is identical to the state of G in H'' at that point with the following exceptions: 

MSef = MSet |J {±G} 

PendViewQueue^ = PendViewQueue with (REMOVE,-G) replacing (JOIN, G) 
ContactSef = LiveSet 

vf[] = vt[]\J{[G]=0,[-G]=0} 

ReceiveSef = ReceiveSet 
FwdQueue^W = FwdQueue[] 

WaitSef = WaitSet 

Inductive Hypothesis 5 (Donation Sub-Hypothesis). Let P be any process that sends a critical 
donation packet to G in H and let k be any untimely packet in PFJ. If G processes the critical 
donation packet from P in H then the state of G in H at the start of the [Crit(P —>■ G).fc‘^’^.=i=|0] 
sub-transaction is identical to the state of G in at the start of the matching transaction in , 
with the following exceptions: 

MSef = MSet |J {±G} 

PendViewQueue^ = PendViewQueue with (REMOVE,-G) replacing (JOIN, G) 
ContactSef = LiveSet 

vf[] = vt[]\J{[G]=0,[-G]=0} 

ReceiveSef = ReceiveSet 
FwdQueue^W = FwdQueue[] 

WaitSef = WaitSet 

ghost{P] < ghosf[P] < ghost_heightp@g^ ^ 
flush[P] < flusP[P] < flush.heightpQi^ 

Inductive Hypothesis 6 (First Co-Donation Sub-Hypothesis). Let P ^ G be any process that 
receives a critical co-donation packet from G in H and let k be any untimely packet in UP. If P pro¬ 
cesses the critical co-donation in H then the state of P in H at the start of the [Crit(G —>■ P).fc‘5^.*|()] 
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sub-transaction in H is identical to the state of P in at the start of the matching transaction 
in H^, with the following exceptions: 


MSef = MSet |J {±G} 

PendViewQueue^ ~ PendViewQueue with (REMOVE,-G) replacing (JOIN, G) 
vf[] = vt[][j{[G]=0,[-G]=0} 


ReceiveSef = ReceiveSet 
FwdQueue^W = FwdQueue[] 

WaitSef = WaitSet 

ghost[G] < ghosf[G] < ghost_heightjj^f^ 
ftush[G] < flusPlG] < ghost-heightjy^^^ ^ 


Inductive Hypothesis 7 (Second Go-Donation Sub-Hypothesis). Let P ^ G be any process 
that receives a critical co-donation packet from G in FI and let k be any uncontacted packet in 
G(3. If P processes the critical co-donation in H then the state of P in H at the start of the 
[Crit(G sub-transaction is identical to the state of P in at the start of the match¬ 

ing transaction in H'', with the following exceptions: 


MSef = MSet |J {±G} 

PendViewQueue^ = PendViewQueue with (REMOVE,-G) replacing (JOIN, G) 
vf[] = vt[]\J{[G]=0,[-G]=0} 

ReceiveSef = ReceiveSet 
FwdQueue^[] = FwdQueue[] 

WaitSef = WaitSet 

ghost[G] < ghost_heightjj@g^^^^^ < ghosf[G] < ghosfheightc@crit{p^G) 
ftush[G] < ghost_heightjj@g^^^^^ < fiusP[G] < ghost_heightQ@Q^-^^i^p^Q-^ 

Inductive Hypothesis 8 (Gonvergent Hypothesis). For any post-critical constellation G and any 
process P that exists at the start of G, if P has already installed the critical view at that point in 
H, then the state of P in H is identical to the state of P in H'" at that same point. Moreover, the 
state of P does not contain any pre-critical messages. 


6.5 Technical lemmas for the History Equivalence Theorem proof 

Lemma 24. Let e = k^^ be a queuing event of a ghost, flush or message packet at process P in 
, and let eg = origin(fc)'^'^ be the original queuing event at process R in H. Assume that the 
inductive hypothesis holds and that one of the following conditions hold as well: 

1. e is pre-critical, and ContactSef p@e = ContactSetp@eg IJ {±G} 

2. e is post-critical, P ^ G and ContactSef p@e = ContactSetp@eo 
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3. e is post-critical, P = G and ContactSef p@e = LiveSetp^eo 
Then Te = ContactSef 

Proof. Most of the cases follow immediately from Lemma [22] and the various inductive hypotheses. 
The one non-trivial case is when e is post-critical, P = G and the Convergent Hypothesis holds. In 
that case it follows from Lemma and the inductive hypothesis that Te = LiveSet’'. We have to 
show that ContactSeC = LiveSet’'. It follows from Lemma isiia that the equality holds for original 
processes. Since G is an original process in H'" we are done. □ 

Lemma 25. Let msg be an unstamped message and let P ^ ±G he a process in H. Suppose that the 
states of P in H and H'" satisfy the First Pre-Critical Hypothesis. Suppose as well that D S LiveSet 
and ±G ^ LiveSet at P in H. Then executing the \protBroadca^ msg) in both histories will result 
in equivalent side effects (see Definition \26\} while preserving the Hypothesis. 

Proof. We follow the execution of the [protBroadcastj procedure step by step. 

The first step is a decision whether to proceed or to queue msg to LaunchQueue. Both histories take 
the same decision here, resulting in identical changes to LaunchQueue. If v_gap > 0 we are done. 

The next step increments mpkt_out.b, keeping it identical. 

The next two steps define a temporary vector vf. Since P ±G, the resulting vector has values 
at H and H^ that bear the same relationship to each other as vt does. 

The next three steps stamp the message. By Definition \n\ this produces equivalent messages. 

The next step creates the queuing event. By Definition |2S1 this event produces equivalent side 
effects. 

The next step creates the local variable index which has the same value in both histories. 

The next step creates the local vector iset[], which is related in H and H'' the same way mpktJnW 
is. 

Since D G LiveSet, we know from Lemma [5] that there is a D coordinate in mpktJn[] and therefore 
the record that is added to BcastWaitSet in the last step bears the required relation for WaitSet 
records. Therefore this step preserves the inductive hypothesis and we are done. □ 

Lemma 26. Let P and Q be two processes that use the same implementation of the ApplyMessage 
up-eall. Further assume that the states of P and Q have the following relations: 

ReplicatedData{P) = ReplicatedData{Q) 
cur_view{P) — cur_view{Q) < Vcrit 
MSet{P) = MSet{Q) |J {±G} 

ReceiveSet{P) = ReceiveSet{Q) 

vt[]{P) = vt[]iQ)[j{[G]=0,[-G] = 0} 
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Then executing the \Scan\ procedure in both processes will result in the same relations being preserved, 
with all other state variables remaining unchanged in both P and Q. Moreover both processes invoke 
the ApplyMessage up-call at the same times and with the same messages. 

Proof. We follow the execution of the IScanI procedure step by step. 

The first step sets the value of deliverable.messagesMound to false at both P and Q. 

The next step is a loop over members of ReceiveSet. Since ReceiveSet is equivalent at P and Q, 
the loop goes over the same messages in the same order in both executions. For each message, we 
check whether the message is a current view message and whether we have already delivered all the 
previous messages from the same source. This amounts to checking whether viEw(msg) = cur_view 
and whether VT(msg)[ORiG(msg)] = \/t[oRiG(msg)] + 1. 

The first check gives the same result in P and Q because viEw(msg) is the same (due to equiva¬ 
lence) and cur_view is assumed to be the same. Moreover, if the check is successful it means that 
viEw(msg) = cur_view < Vcut and therefore ORiG(msg) ±G. This is because G starts life with 
cur_view-\- v.gap — Vcrit so it does not originate any messages before cur_view = Vcrit and v_gap = 0. 
-G does not originate any messages at all. 

The second check gives the same result because of equivalence and because of our assumption that 
ORiG(msg) ^ ±G. Therefore VT(msg)[ORiG(msg)] and vt[oRiG(msg)] are the same at P and Q. 

Therefore the conditional block is executed for the same messages, following the steps: 

• alLdependentsMelivered is set to true in both P and Q. 

• A loop searches MSet for coordinates whose values will prevent delivery. Other than ±G the 
set MSet in Q contains the same processes as in P and the test yields the same results in both. 
The only way this loop could produce divergent results is if Q decided that the message was 
deliverable while P found an impediment that is related to pid = ±G. But the equivalence 
condition on ReceiveSet guarantees that VT(msg)[±G] = 0 in P, so this cannot happen and 
the loop must exit with the same value of alLdependentsMelivered in both executions. 

• The decision to deliver the message is controlled by alLdependentsMelivered so both P and Q 
make the same decision. The message delivery takes the following steps: 

— The value of deliverable.messagesJound is set to true at both P and Q. 

— The vector time is incremented at the ORiG(msg) coordinate. This is the same coordinate 
in P and Q due to equivalence, and since it is not ±G, we increment an identical value 
in vt, keeping it identical. 

— The message is removed from ReceiveSet and stripped of its vector time and view stamps, 
leaving it with only its origin stamp. Since the origin stamp is identical for equivalent 
messages, this leaves the message identical in P and Q, in addition to leaving ReceiveSet 
equivalent. 

— The identical message is applied to the identical user data object Replicated Data in the 
same way. This leaves the data object identical. 

The last step is a recursive call to IScanI controlled by deliverable-messagesJound. We have shown 
that up to this point the assumed relationships have not changed and no variables changed other 
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than some of the ones mentioned. Since P and Q take the same decision about making the recursive 
call, the lemma is now proven by induction. □ 

Lemma 27. Let P and Q be two processes whose states have the following relations: 


FwdWaitSet{P) = 0 
BcastWaitSet{P) = 0 
cur_view{P) 
v.gap{P) 
gh ost_h eigh t{P) 
flush_height{P) 


iff FwdWaitSet{Q) — 0 
iff BcastWaitSet{Q) = 0 
= cur_view{Q) 

= V-gap{Q) 

= ghost_height{Q) 

— flush_height{Q) 


Then executing the I CheckFlush\ vrocedure in both processes has the following two results: 

• The same relations are preserved, with all other state variables remaining unchanged in both 
P and Q. 

• The same side effects occur in P and Q. 

Proof. We follow the execution of ICheckFlushl step by step. 

The first step exits if FwdWaitSet is not empty. P and Q make the same decision here by assumption. 

The next block compares ghospheight to cur.view + v_gap. If ghospheight is low it updates it and 
broadcasts ghost packets of height cur_view + v_gap. Since ghospheight, cur_view and v_gap are 
all the same in P and Q, this block results in the same side effects and preserves the postulated 
relations without changing any other variables. 

The rest of the procedure is a repeat of the first part with BcastWaitSet and flush.height replacing 
FwdWaitSet and ghospheight, so the same argument holds. □ 

Lemma 28. Let P and Q be two processes whose states have the following relations: 

FwdWaitSet{P) =9 iff FwdWaitSet{Q) — 0 
BcastWaitSet{P) = 0 

cur_view{P) = cur_view{Q) 
v.gap{P) = v.gap{Q) 
ghosPheight{P) = ghosPheight{Q) 
flush_height{P) — ghosPheight{Q) 


Then executing the I CheckFlush\ vrocedure in both processes has the following two results: 

• The same relations are preserved, with all other state variables remaining unchanged in both 
P and Q. 

• If Q broadcasts ghost packets of height v then P broadcasts ghost packets of height v followed 
by flush packets of height v. 
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• If Q does not broadcast ghost packets, then P has no side effects. 

Proof. We follow the execution of ICheckFlushl step by step. 

The first step exits if FwdWaitSet is not empty. P and Q make the same decision here by assumption. 
If they exit then neither has side effects and we are done. 

The next block compares ghospheight to cur.view + v_gap. If ghospheight is low it updates it and 
broadcasts ghost packets of height cur_view + v^gap. Since ghospheight, cur_view and v^gap are 
all the same in P and Q, this block results in the same side effects and preserves the postulated 
relations without changing any other variables. 

The rest of the procedure is a repeat of the first part with BcastWaitSet and flush.height replacing 
FwdWaitSet and ghospheight. If P and Q decided to broadcast ghost packets, then our assumptions 
will force P to broadcast flush packets as well, as the lemma claims. Since the P decision to 
broadcast flush packets follows Q’s decision to broadcast ghost packets, the value of flush.height in 
P ends up being equal to the value final value of ghospheight in Q, as claimed. □ 

Corollary 9. 1. Let P be a process in H and suppose that the state relationships in one of the 

inductive hypotheses or sub-hypotheses, excluding the Second Pre-Critical Hypothesis, hold 
with respect to states of P in H and H'". Then executing the I CheckFlusM procedure at P in 
H and H'" preserves the same relations and causes the same side effects. 

2. Suppose that the state relationships in the Second Pre-Critical Hypothesis hold at D, G and 
-G in H^. Then executing the I CheckFlusM vrocedure in all three processes preserves the same 
relations and causes side effects that are related as stated in Lemma 


Proof. □ 

Lemma 29. Let X be a process in H that is in the middle of the execution of a constellation in 
both H and H^ (if X = G then the constellation must be post-critical). Suppose that the First Pre- 
Critical, Interim non-G, Interim G or Convergent Hypothesis holds with respect to the state of X in 
H and H^ (if X = G then the pre-critical case does not apply). Also assume that uLview = cur^vievC 
at X in H'". Then executing the \ TryToInstall\ procedure in both histories at X has the following 
results: 

• If cur_view < Vcrit in H at the end of the execution then the same inductive hypothesis (First 
Pre-Critical or Interim) still holds. 

• If cur^view > Vcut in H at the end of the execution then the Convergent Hypothesis holds. 

• If P — G and the procedure installs the critical view then G launches the APP thread at exactly 
the same moment in H and H'^. 

• In all cases the side effects in H^ are equivalent to the side effects in H. 

Proof. We follow the execution step by step: 

The procedure starts with a loop that goes over the live processes, looking for an impediment to 
installation of the next view in the form of flushW values that are too low. In all the situations under 
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consideration we have 


cur_viev/ = cur_view 
v_gap^ = v_gap 
LiveSet*^ D LiveSet 

flustf[Y] = flush[Y] whenever Y € LiveSet 

Therefore if there is an impediment to installation in H, the same impediment exists in H''. The 
converse is trivially true in all but the pre-critical case, because that is the only case where LiveSet’’ ^ 
LiveSet. However in that case we have, by the First Pre-Critical Hypothesis: 

f/us/7’'[±G] = ghost[D] > flush[D] 

where the inequality on the right follows from Lemma[HllS]). Therefore if G or -G form an impediment 
in H’’, then so does D in H. Therefore both histories make the same decision on whether to proceed 
with installing all the pending views. 

The next part of the procedure is a loop that installs all the pending views. It goes through the 
following steps: 

• The obsolete messages are removed from ReceiveSet. Because ReceiveSet’’ = ReceiveSet, they 
have the same messages and each message has the same message view. Therefore the same 
messages are discarded in H and iJ’’ and the hypothesis is preserved. In the case where the 
view being installed is the critical one, this step leaves ReceiveSet with no messages of pre- 
critical view. By the definition of equivalence (Definition this means that ReceiveSet’’ = 
ReceiveSet. 

• The obsolete messages are removed from FwdQueue[]. This is very similar to the previous step. 
The only complication is that in the pre-critical case there are two queues, FwdQueue’’[±G], 
that do not exist in H. However these queues are empty according to the First Pre-Critical 
Hypothesis, and so this step preserves the inductive hypothesis as well. As in the previous case, 
if the view being installed is the critical one, then at this stage FwdQueue’’[A] = FwdQueue[A] 
at every process X. 

• The next two steps increment cur^view and decrement V-gap. This obviously preserves the 
inductive hypothesis. Also uLview= cur.viev/' — 1. This will be fixed soon. 

• The next step pops the head of PendViewQueue. In most cases this trivially preserves the 
inductive hypothesis, as well as yielding an identical value for notification. However in the case 
where the view being installed is the critical one, this step renders PendView/Queue identical 
in both histories while yielding different values for notification. In H we have a value that 
indicates that G is joining while in iJ’’ we have a value that indicates that -G is leaving. 

• The next step updates MSet according to the type of notification, and applies the change to 
the replicated application data. ’We have to consider the following cases: 

1. The notification is for a non-critical joining of a process which is not the local process 
X. Since the first join in H occurs at the critical view we have 

cur_view > Vcrit > MAGIC_VIE’W 


128 


Both histories add the new process to MSet, preserving the inductive hypothesis. Then 
in H the process X invokes the up-call ApplyJoin@X, which modifies Replicated Data in 
an unspecified user-defined way. In the process X invokes Apply Join’'@X. According 
to l5.4.101 ApplyJoin’'@X behaves the same way in this case, regardless of whether X = G 
or not. It increments uLview and restores the equality uLview= cur^viev/. Since uLview 
is positive, the up-call invokes the original ApplyJoin@X. Therefore ReplicatedData is 
modified in the same way in and the inductive hypothesis is preserved. 

2. The notification is for a non-critical joining of the local process X. This case is just like 
the previous one but in addition process X also launches the main APP thread, with the 
same identity parameter pid in both histories. We may assume that the thread will start 
executing at the exact same time in both histories (because it is possible, not because it 
is probable). 

3. The notification is for a non-critical removal of a process. In both histories X removes the 
process from MSet, preserving the inductive hypothesis. Then in H it invokes the up-call 
ApplyRemoval@X, which modifies ReplicatedData in an unspecified user-defined way. In 

it invokes ApplyRemovaFQX. If X ^ G then according to 15.4.101 the up-call simply 
increments uLview, restoring the equality uLview = cur_view^, and then calls ApplyRe- 
moval@X, thus preserving the inductive hypothesis. If X = G then ApplyRemovaF@G 
increments ui_view (restoring the equality with cur_view^) and then invokes either Ap¬ 
ply Removal@G or ApplyRemoval@D, according as ui_view > MAGIG_VIEW or not. By 
Definition [M] ui-view > MAGIG_VIEW exactly when the constellation is post-critical, 
which must be the case here when X = G, therefore ApplyRemoval@G is invoked and 
ReplicatedData is modified the same way in both histories. 

4. The notification is for the critical view and X ^ G. In this case process X in H 
invokes the up-call ApplyJoin@X and in the up-call ApplyRemovaFQX. Bv 15.4.101 
the ApplyRemovaF@X call increments ui.view, thus restoring the equality ui-view = 
cur^view^. The up-call proceeds to invoke ApplyJoin@X. This modifies ReplicatedData 
the same way as in H. 

5. The notification is for the critical view and X = G. In this case process G in H invokes 
the up-call ApplyJoin@G and then launches the APP thread with pid = G. In iA’' it in¬ 
vokes ApplyRemoval@G. Bv l5.4.l01 the ApplyRemovaF@G call increments ui_view, thus 
restoring the equality ui-view = cur.viev/'. The up-call proceeds to invoke ApplyJoin@G 
with pid = G exactly as G did in H. This preserves the equality of ReplicatedData. 

In H'' the APP thread is not launched. But since G is an original process in iJ’’ the 
thread was already launched, with the same paramter, at the beginning of time when 
the |protStart| procedure was invoked. Since both invocations are asynchronous we may 
assume that by sheer luck the early invocation of the thread in is delayed so much 
that the thread starts execution at the exact same point in time in both histories. 

• The vector time is reset. This means that the previous vector time is replaced with a vector 
of zeroes, one per process in MSet. This is easily seen to preserve the inductive hypothesis. 

At this point we have to take stock of the case where the view installation was the critical 
one. This is the boundary between the Interim period, where the Interim Hypotheses are in 
force, and the Convergent period. Following the steps we took so far demonstrates that all the 


129 












differences that existed in the state of the process in the two histories have now dissolved. The 
±G difference in MSet has been bridged. As a result vt converged as well. PendViewQueue 
shed the one record that was different and became equal. ReceiveSet and FwdQueue[] have 
gone from being equivalent to being equal, as we have seen. 

Since we managed to install the views we know that the self flush height is high namely 

flush[X] = cur_view+ v_gap 

From Lemma IMS!) it follows that flush_height = cur_view + v^gap. Since we start out with 
V-gap > 0 (otherwise no views are installed) we know by Lemma [51P|) that WaitSet must 
be empty at this point in both histories and therefore equal as well. The only remaining 
possible difference is in ContactSet, in the case X = G only. But this difference cannot 
exist here because Lemma |S1IS|) implies that if F S LiveSet \ ContactSet then flush[Y] < 
cur_view+ v^gap. This would have prevented the views from being installed. As a result the 
Convergent Hypothesis holds, and the computations have now converged for X. 

• the next step is an invocation of the IScanl procedure. By Lemma |261 this step preserves the 
inductive hypothesis and creates no side effects 

The last step involves broadcasting all the messages out of LaunchQueue. 

At this point v^gap = 0 and therefore we cannot be in the interim period. Therefore we only 
need to consider the First Pre-Critical and the Convergent Hypotheses. Under either hypothesis 
LaunchQueue'" = LaunchQueue and by Lemma[25leach call to |protBroadcast| preserves the hypothesis 
and generates equivalent side effects in both histories. From Lemma ETl it follows that the target 
set of each side effect is equal to ContactSet’" in LU. □ 

Lemma 30. Suppose that G, -G and D are in the middle of the execution of a pre-critical 
constellation in . Suppose that the Second Pre-Critical Hypothesis holds. Also suppose that 
uLview = cur^viev/ in all three processes. Then executing the \ TryToInstall\ procedure in all three 
processes preserves the inductive hypothesis and produces no side effects in either G or -G. 

Proof. We follow the execution step by step: 

The procedure starts with a loop that goes over the live processes, looking for an impediment to 
installation of the next view in the form of flush)] values that are too low. In the situation under 
consideration we have the same values of cur.view, v_gap, LiveSet and flush)]. Therefore all three 
processes reach the same decision on installation of the pending views. Since PendViewQueue is 
also identical among the processes, the same views get installed. 

The next part of the procedure is a loop that installs all the pending views. It goes through the 
following steps: 

• The obsolete messages are removed from ReceiveSet and FwdQueue)]. Because both sets are 
identical in all three processes, the same messages are discarded in all three processes and the 
hypothesis is preserved. 

• The next two steps increment cur_view and decrement v_gap. This obviously preserves the 
inductive hypothesis. It also results in uLview — cur.viev/' — 1. 
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• The next step pops the head of PendViewQueue. This trivially preserves the inductive hy¬ 
pothesis, as well as yielding an identical value for notification. 

• The next step updates MSet according to the type of notification, and applies the change 
to the replicated application data. Since we are dealing with a pre-critical constellation, the 
view change must be a removal of a process, and by Definition [M] cur_wew < MAGIC_VIEW. 
Process D executes ApplyRemovaF@D while ±G execute ApplyRemovaFQG. Bv l5. 4.101 this 
results in all cases in incrementing uLview, which restores the equality uLview — cur_viev/, 
and in invoking ApplyRemoval@D, which modihes ReplicatedData in the same way in all three 
processes. 

The vector time is reset. This means that the previous vector time is replaced with a vector 
of zeroes, one per process in MSet. This is easily seen to preserve the inductive hypothesis. 

the next step is an invocation of the IScanI procedure. By Lemma 1201 this step preserves the 
inductive hypothesis and creates no side effects. 

The last step involves broadcasting all the messages out of LaunchQueue. By the Second Pre- 
Gritical Hypothesis LaunchQueue is empty in ±G, so this step generates no side effects there, while 
generating a single original message broadcast per message in D. As far as the inductive hypothesis 
is concerned, for every message in LaunchQueue that is broadcast by D the value of mpkt_out.b is 
incremented and a copy of the stamped message is attached to BcastWaitSet in D together with 
an instability vector. In addition LaunchQueue is emptied out in D. These state changes do not 
violate the Second Pre-Gritical Hypothesis, which only requires that BcastWaitSet and LaunchQueue 
be empty and mpkt_out.b be zero in ±G. □ 


6.6 Proof of the History Equivalence Theorem 


We start the proof at the beginning of time. At time zero, each member process is initialized by 
the |protStart| procedure fsee 13.2.51) . Direct inspection shows that the state differences between H 
processes and processes conform to the pre-critical inductive hypotheses. Notice also that after 
IprotStar^ is executed in H'' 


uLview — cur.viev/' = 0 


It is easy to check that the equality between uLview and cur^view^ continues to hold as long as 
executes CBCAST. This is because the identity is passed from parent to child, and the only 
place where either value is changed is in the |TryToInstall| procedure, where these two values are 
incremented in tandem. 

Suppose that the inductive hypothesis holds at constellation label L = [^t.s.OjO]. Both H and 
have a set of transactions or sub-transactions which share the constellation label L. But in H these 
transactions are generated by the execution of CBCAST procedures as a reaction to triggers, whereas 
in the transactions are manufactured artihcially. We have to show three things. First we must 
show that if executes the CBCAST protocol it will produce transactions that are identical to its 
observed ones. Then we have to show that if executes the CBCAST protocol, then once all the 
(sub-)transactions in the constellation conclude the inductive hypothesis continues to hold. As a 
hnal step we have to show that the triggers of the next constellation in arise naturally from the 
past behavior of . This guarantees that continues to look like an execution of CBCAST and 
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APP all the way up to the next constellation, and that the inductive hypothesis continues to hold 
until that point. 

We divide our analysis at each stage into cases, depending on the type of constellation, and the 
type of side effects (See 14.2.11 - 14.2.5|) we observe in H within the constellation. In each case we 
try to deal with all of the inductive hypotheses together. In some cases only some of the inductive 
hypotheses are relevant. For example, join notifications do not occur pre-critically, by definition. 

We also make a repeated use of arguments that show that certain variables that start out equivalent 
at the start of a constellation end up remaining equivalent at the end. In the Convergent period 
these variables are required to be identical and not just equivalent, so proving that they remain 
equivalent is not quite enough. But as long as the procedural parameters that are used by CBCAST 
to execute each transaction are identical in both histories, the state will remain identical at the end 
rather than just equivalent. We will point out the few cases where the procedural parameters are 
not equal, and otherwise gloss over this issue. 


6.6.1 Notification constellations 

• Type of Constellation: A non-critical GMS removal notification of a process R. 

H Transactions: One execution of the |protRemove| procedure at each surviving process. 

Observed behavior in Post-critically, the transactions occurring in are equivalent 
to the H transactions occurring at the same processes. Pre-critically, equivalent trans¬ 
actions occur at the non-G processes, and in addition there are two transactions at ±G. 
These transactions have the same triggers as the other transactions in the constella¬ 
tion, and their side effects are related to the original transaction at D according to the 
following cases (see Lemma fT^ : 

— If the original D transaction in H queues a sequence of message multicasts then the 
±G transactions in H'' queue the equivalent message multicasts. 

— If the original D transaction in H has no side effects then the ±G transactions in 
H'^ have no side effects. 

— If the original D transaction in H queues a ghost multicast (with or without a 
succeeding flush multicast) then the ±G transactions in H'^ queue a ghost multicast 
followed by a flush multicast, of the same height as the original ghost multicast. 

Execution of CBCAST in H^: We follow the execution step by step, dealing with all the in¬ 
ductive hypotheses at once. The first step increments v_gap in all the executions. By the 
inductive hypothesis v_gap is identical at all compared processes. After incrementation, 
v_gap remains identical since all the involved processes execute the procedure. Therefore 
the first step preserves the inductive hypothesis. This step produces no side effects. 

The next step adds a record to PendViewQueue. In the pre-critical and convergent 
periods PendViewQueue’’ = PendViewQueue and the added record is identical as well, so 
the First Pre-Critical Hypothesis and the Convergent Hypothesis are preserved. In the 
interim period PendViewQueue’’ is not identical to PendViewQueue but the difference is 
preserved after the addition of the new record, so the Interim Hypotheses are preserved 
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as well. Finally, pre-critically PendViewQueue’’ is identical at D, G and -G so the Second 
Pre-Critical Hypothesis is also preserved. 

The next two steps remove R from LiveSet and ContactSet. Pre-critically both LiveSet’’ 
and ContactSeP differ from LiveSet and ContactSet by the addition of ±G, and both sets 
are identical at D, G and -G in . Since R ^ {D, G, -G} in the pre-critical case, the First 
and Second Pre-Gritical Hypotheses are preserved. Post-critically LiveSet’’ = LiveSet and 
this property is preserved. Away from G the same is true for ContactSet’’. Therefore the 
Interim non-G and Convergent Hypotheses are preserved. At G, ContactSeP = LiveSet, 
so the Interim G Hypothesis is preserved as well. 

The next step is a loop that stabilizes messages in WaitSet with respect to R. Post- 
Critically WaitSet’’ = WaitSet in all cases. The loop preserves this relationship, which 
guarantees the preservation of all the post-critical inductive hypotheses. 

Pre-critically, BcastWaitSef’ is empty at ±G while FwdWaitSet’’ is identical at D, G and 
-G with the exception of a slightly different index at each record. The loop preserves these 
relationships, which guarantees the preservation of the Second Pre-Critical Hypothesis. 

Pre-critically at each non-G process WaitSeP is very close to being equivalent to WaitSet 
with the only differences being larger instability vectors. We know that in the pre-critical 
case R ^ {D, G,-G} and therefore removing R from the instability sets in WaitSet’’ and 
WaitSet does not disturb their hypothesized relationship. Finally this fact also guarantees 
that each instability set will empty out in WaitSet if and only if it empties in WaitSet*" 
so the loop will always make the same decisions on record removal from WaitSet in both 
histories. Therefore the First Pre-Critical Hypothesis is preserves as well. 

The next step discards mpktJn[R]. This preserves all the hypotheses. 

The next step forwards all the messages out of the forwarding queue FwdQueue[i?]. 
In all periods and at all participating processes FwdQueue’'[i?] = FwdQueue[i?], while 
pre-critically FwdQueue’’[i?] is identical at D, G and -G. Therefore the loop proceeds 
over equivalent or identical messages (depending on which hypothesis we are checking), 
executing the following steps at each message: 

— Popping the head of FwdQueue[i?], yielding a message msg - this preserves all the 
hypotheses, msg’’ = msg at all processes in all periods. Pre-critically at D, G, and 
-G the value of msg*" is the same. 

— Incrementing mpkGout.f - this preserves all the hypotheses. 

— Greating index = mpkGout. This yields index*" = index for all processes in all periods. 
Pre-critically at D, G and -G this yields: 

index’^./(±G) = index’^./(ZI) 
index’’.6(±G) = 0 

— Greating an instability vector iset = mpktJn[]. This yields iset*" = iset for all pro¬ 
cesses in all the post-critical periods. Pre-critically it yields an identical value of 
iset*" at D, G and -G. Pre-critically this also yields the following relation at each 
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non-G process: 


iset’'[X]./ 


iset’’[X].6 


iset[X]./ X^±G 
iset[D]./ X = ±G 

iset[X].6 X^±G 
0 X = ±G 


— Queuing a message multicast containing the message msg and targeted at ContactSet. 
Since msg’’ = msg this side effect matches the observed side effect in . Since the 
inductive hypothesis holds at this point, Lemma [23] guarantees that the target set 
matches the observed target set. 

— Adding a record (msg, index, iset[]) to FwdWaitSet. Post critically FwdWaitSet’’ = 
FwdWaitSet at every process and the new record is equivalent as well, which preserves 
the relevant hypothesis. 

Pre-critically at D, G and -G in iL’’, FwdWaitSet’’ is almost identical, except for a 
zeroed out .b coordinate at each index at ±G. The new record (msg’’, index’’, iset’’]]) 
exhibits exactly this behavior at the three processes so the Second Pre-Gritical Hy¬ 
pothesis is preserved. 

Pre-critically at any non-G process WaitSeP is nearly equivalent to WaitSet, with 
the addition of ±G coordinates to iset’’ wherever a D coordinate exists in iset. The 
value of iset"" in the new record in iL’’ exactly matches the required difference from 
the value of iset in the new record in H, while msg’’ = msg and index = index. 
Therefore the First Pre-Gritical Hypothesis is preserved. 

In the next few steps we discard FwdQueue[i?], ghost[R] and flush[R], actions which 
preserve all the inductive hypotheses. The one thing to notice is that this follows from 
the fact that pre-critically R ^ {D, G,-G}. 

In the last step we call ICheckFlushl which by Gorollary [5] preserves the inductive hy¬ 
potheses. Moreover the same corollary guarantees that the side effects produced by 
IGlieckFlushl are equal, and therefore equivalent, to the observed side effects. Lemma 1231 
guarantees that the target set produced by the execution of IGheckFlushl for each side 
effect, namely ContactSeP, is equal to the observed target set. 

• Type of Constellation: A non-critical GMS join notification of a process J, with parent 
process K. By definition this must be a post-critical constellation with J ^ G. 

H Transactions: One execution of the [protJoiiij procedure at each existing process. One 
execution of the [protRun] procedure at J. J and K have identical states at the start of 
the transaction. 

Observed behavior in iL’’: The transactions at all the processes, including J, are equiva¬ 
lent to the H transactions at the same processes. 

Execution of CBCAST in 7L’’: J and K start with the same state, and the same calls are 
executed in 7L’’ as in 7L, at the same processes. Since there is no pre-critical case here. 
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we only have to verify the preservation of the Interim non-G, Interim G and Convergent 
Hypotheses. We follow the execution step by step, starting with protJoin] 


The first few steps increment v_gap, add a record to PendViewQueue, add J to LiveSet 
and ContactSet and create a new and empty entry in FwdQueue For the process J. These 
steps can be easily seen to preserve all three Hypotheses where they apply. 


The next step goes over WaitSet, and creating an instability coordinate for J whenever 
such a coordinate exists for the parent process K. Only the forwarding part of the co¬ 
ordinate is copied. The broadcasting part is zeroed. Post-critically, WaitSet’’ ~ WaitSet 
so the loop proceeds identically in H and i?’’ and preserves the applicable inductive 
hypotheses. 

The next three steps create J coordinates in the ghost[], flushW and mpktJn[] vectors. 
These steps can be easily seen to preserve the applicable hypotheses. 


The next two steps create and queue a donation packet to J. Since all the ingredients of 
the donation vector donation are equivalent, the resulting side effect in is equivalent 
to the corresponding side effect in H, as observed in . The target set of the packet is 
equal to {J} in both histories. 

In the last step we call ICheckFlushl which by Corollary [5] preserves the inductive hy¬ 
potheses. Moreover the same corollary guarantees that the side effects produced by 
ICheckFlushl are equal, and therefore equivalent, to the observed side effects. Lemma [Ml 
guarantees that the target set produced by the execution of ICheckFlushl for each side 
effect, namely ContactSet’’, is equal to the observed target set. 

We now look at the execution steps in |protRun| These steps are taken by J in both 
H and W’’. Recall from 13.31 that a new process starts out with a state identical to its 
parent K. Therefore, prior to the execution of [protRunj the states of J in W and W’’ 
conform to the inductive hypothesis for the post-critical state of K, which can vary 
according as K = G or K ^ G. However, if one looks at these two cases in the inductive 
hypothesis, one sees that they claim the same things, except that in the case K ^ G, 
ContactSeR = ContactSet whereas in the case K = G, ContactSet’’ = LiveSet. This will 
make no difference in the end because the [protRunj procedure recomputes ContactSet in 
a way that renders its value identical in both H and H'" as is expected since J ^ G. 


The first three steps in jprotRun increment v_gap and update PendViewQueue and LiveSet. 
These steps are easily seen to preserve either the Interim Non-G Hypothesis or the Con¬ 
vergent Hypothesis, as the case may be. The Interim G Hypothesis is mostly preserved 
except that now ContactSeR LiveSet. This violation is corrected in the next step. 


The next step resets ContactSet to include the process identifier of J only. This step 
forces ContactSet to be identical in H and W’’. This step preserves the Interim Non- 
G and the Convergent Hypotheses. If the Interim G Hypothesis applied at the start 
of the constellation, now the non-G Hypothesis would apply, which is the appropriate 
hypothesis for J. 


All the following steps except the last one rather trivially preserve the appropriate Hy¬ 
pothesis (either the Non-G Hypothesis or the Convergent Hypothesis). 
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In the last step we call ICheckFlushl which by Corollary [5] preserves the inductive hy¬ 
potheses. Moreover the same corollary guarantees that the side effects produced by 
ICheckFlushl are equal, and therefore equivalent, to the observed side effects. Lemma [Ml 
guarantees that the target set produced by the execution of ICheckFlushl for each side 
effect, namely ContactSeff, is equal to the observed target set. 


• Type of Constellation: The constellation is the critical GMS notification. 


H Transactions: One execution of the protJoin procedure at each existing process. One 
execution of the |protRun| procedure at G. G and D have identical states at the start of 
the transaction. 


Observed behavior in H^-. The transactions at all the processes, including G, are equiva¬ 
lent to their counterparts in H with the exceptions that in the trigger notification is 
a removal notification rather than a join notification, and the donation packet queuing 
event at non-G processes in H is missing in . 

Execution of CBCAST in H'": The surviving processes, including G, execute the |protRemove| 
procedure. 

Not surprisingly, this case is more subtle than the other cases. First, the processes 
execute a different call in H and . Second, the critical notification separates the pre- 
critical period from the interim period. Therefore, when we argue that the constellation 
preserves the inductive hypothesis, we are really saying that if the First and Second Pre- 
Critical Hypotheses hold before the execution of the respective calls, then the Interim G 
and Interim non-G Hypotheses hold after the execution. 


There is one last complication. Every process that participates in the critical constel¬ 
lation executes exactly one of three procedures. In iL’’, every process executes the 
|protRemove| procedure. In H, every process except G executes the |protJoin| procedure 
while G executes the |protRun| procedure. 


The fact that protRemove replaces protJoin at non-G processes explains the observed 
absence of a queuing of donation packets at those processes in H'". 


Each of these calls ends with a call to ICheckFlushl which is known to preserve all the 
inductive hypotheses (see Corollary [5]) . Therefore in the following analysis we are going 
to ignore the call to lCheckFlushl We will show that the inductive hypothesis is preserved 
if each process pauses just before executing this call. Then Corollary [5] together with 
Lemma will complete the argument. 


Due to the fact that different code is executed in H and iL’’ we cannot use our usual 
method of following the parallel execution step by step. Instead we prove this case one 
state variable at a time. For each variable we show that if the pre-critical hypotheses 
hold at the start of the execution, then the interim hypotheses (G and non-G) hold for 
that variable at the point where ICheckFlushl is invoked. 


cur_view 

Non-G case: Initially cur_viev/ = cur.view according to the First Pre-Critical Hy¬ 
pothesis. Both the [protJoin and the protRemove procedures leave it unchanged, so 
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it remains equal at the end of the constellation, as required by the Interim non-G 
Hypothesis. 

G case: Initially 

cur.viev/ {G) — cur_view'~ {D) 

according to the Second Pre-Critical Hypothesis. At the same time 

cur.viev/ {D) — cur_view{D) 


according to the First Pre-Critical Hypothesis, and 

cur_view{D) = cur_view(G) 

because G starts life with a state that is identical to the state of its parent Isee 13.311 . 
Therefore 

cur.w'en/’ (G) = cur_view{G) 

at the start of the constellation. Since the current view is not changed by any of the 
three procedures, it remains equal at the end of the constellation, as required by the 
Interim G Hypothesis. 

Since this argument will come up repeatedly, we will refer to it as the sameness 
argument. 


v-gap 

Non-G case: Initially v.gap^ = v_gap. Both the |protJoin| and the |protRemove| pro¬ 
cedures increment the view gap, so it remains equal as required. 

Gcase: Initially v_gap’'(G) = v_gap(G) by the sameness argument. Both the |protRun| 
and the |protRemove| procedures increment the view gap, so it remains equal as 
required. 


MSet 

Non-G case: Initially MSeP = MSet 1J{±G}. Neither the |protJoin| procedure nor 
the |protRemove| procedure change this variable, so the same relation continues to 
hold at the end of the constellation, as required. 


G case: Initially MSeP = MSet IJ {±G} as can be easily seen by a slight modification 
of the sameness argument. Neither the [protRun procedure nor the protRemove 


procedure change this variable, so the same relation continues to hold at the end of 
the constellation, as required. 


PendViewQueue 

Non-G case: Initially PendViewQueue’' = PendViewQueue. The protJoin] procedure 
appends a (JOIN, G) record to PendViewQueue, while the protRemove] procedure 


adds a (REMOVE,-G) record, which is exactly what is required by the Interim 
non-G Hypothesis. 

G case: Initially PendViewQueue’’ = PendViewQueue by the sameness argument. 
The protRunj procedure appends a (JOIN, G) record to PendViewQueue, while the 
jprotRemove procedure adds a (REMOVE,-G) record to PendViewQueue’’, which is 
exactly what is required by the Interim G Hypothesis. 


137 




























LiveSet 

Non-G case: Initially LiveSet’' = LiveSet 1J{±G}. The protJoin procedure adds G 
to LiveSet, while the |protRemove| procedure removes -G from LiveSet’’, so LiveSet’’ 
becomes equal to LiveSet as required by the Interim Non-G Hypothesis. 

G case: Initially LiveSet’’ = LiveSet IJ {±G} as can be easily seen by a slight modi¬ 
fication of the sameness argument. The |protRun| procedure adds G to LiveSet while 
the protRemove procedure removes -G from LiveSet’’, LiveSet’’ becomes equal to 
LiveSet as required. 


ContactSet 

Non-G case: Initially ContactSet’’ = ContactSet IJ {±G}. The |protJoin] procedure 
adds G to ContactSet, while the |protRemove| procedure removes -G from ContactSeR, 
so ContactSeR becomes equal to ContactSet as required. 

G case: It follows from Lemma 151^]) that initially ContactSeR = LiveSeR. The 
IprotRemovel procedure removes -G from both LiveSet’’ and ContactSeR, keeping 
them equal. 


We already know that at the end of the constellation LiveSeR = LiveSet and therefore 
we end up with ContactSeR = LiveSet, as required by the Interim G Hypothesis. 

Non-G case: Initially vR[] = vt[] IJ {[G] = 0, [-G] = 0}. Neither the |protJoin] pro- 
cedure nor the |protRemove| procedure make changes to vt[], so the relation remains 
the same, as required. 

G case: Initially vR[] = vt[] J {[G] = 0, [-G] = 0} as can be easily seen by a slight 
modification of the sameness argument. Neither the [protRun] procedure nor the 
IprotRemovel procedure make changes to vt[], so the relation remains the same, as 
required. 


Replicated Data 

Non-G case: Initially Replicated Data’’ = ReplicatedData. Neither the |protJoin| pro- 
cedure nor the IprotRemovel procedure invoke any of the user up-calls fsee 12.3.3)1 . so 
the relation remains the same, as required. 


G case: Initially ReplicatedData’’ = ReplicatedData by the sameness argument. Nei¬ 
ther the IprotRun] procedure nor the |protRemove| procedure invoke any of the user 
up-calls, so the relation remains the same, as required. 


ReceiveSet 

Non-G case: Initially ReceiveSet’’ = ReceiveSet. Neither the |protJoin| procedure nor 
the protRemove procedure make changes to ReceiveSet, so the relation remains the 
same, as required. 


G case: Initially ReceiveSet’’ = ReceiveSet by the sameness argument. Neither the 
IprotRun] procedure nor the |protRemove| procedure make changes to ReceiveSet, so 
the relation remains the same, as required. 

FwdQueue[] 

Non-G case: Initially FwdQueue’’)] = FwdQueue)] J {[G] = 0, [-G] = 0}. In iL’’ the 
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|protRemove| procedure forwards all the messages in FwdQueue’’[-G] and removes it. 
Since this queue is empty, the overall effect is simply the removal of the empty queue 
without any side effects. In H the |protJoin| procedure adds an empty queue for G. 
Therefore FwdQueue’’ becomes equivalent to FwdQueue as required. 


G case: Initially FwdQueue’'[] = FwdQueue[] IJ {[G] = 0, [-G] = 0} by the sameness 
argument. In the |protRemove removes the empty queue for -G without creating 
any side effects. In H the protRun] procedure adds an empty queue for G. Therefore 
in this case as well the Interim G Hypothesis holds and no side effects are created. 


WaitSet, which includes BcastWaitSet and FwdWaitSet 

Non-G case: By the First Pre-Gritical Hypothesis, the wait set contains equivalent 
messages in and H, with a slight difference in their instability sets. Namely 
that whenever the instability in H has a D coordinate, the instability in H'^ has ±G 
coordinates in addition. These coordinates have the same / field as the D coordinate. 


and a zero b field. In H, the |protJoin] procedure adds the exact same G instability 


wherever a D instability exists, while in iJ*" the |protRemove 

procedure removes the 

-G instability. If a message becomes stable as a result the 

protRemovel procedure 


removes the message from WaitSet. This however cannot happen because the First 
Pre-Gritical Hypothesis guarantees that the -G instability is accompanied by G and 
D instabilities. Therefore WaitSet’’ becomes equivalent to WaitSet as required by 
the Interim Non-G Hypothesis. 


G case: In this case we have to analyze the two parts of WaitSet separately. 


We start with BcastWaitSet. By the Second Pre-Gritical Hypothesis BcastWaitSet’’ 
starts out empty. The |protRemove| procedure does not change this fact, and there¬ 
fore BcastWaitSet*" ends up empty. On the other hand in H process G executes 
the IprotRun] procedure which empties BcastWaitSet. So both BcastWaitSet’’ and 
BcastWaitSet end up empty and therefore equivalent, as required by the Interim G 
Hypothesis. 


The case of FwdWaitSet is more complicated. First we verify that FwdWaitSeP 
and FwdWaitSet contain equivalent messages. Indeed it follows from the Second 
Pre-Gritical Hypothesis that at the start of the transaction FwdWaitSet’’(G) and 
FwdWaitSet’’(T>) contain the same messages. By the First Pre-Gritical Hypothesis 
FwdWaitSet’’(T>) and FwdWaitSet(ZI) contain equivalent messages. The |protRemove| 
procedure does not remove any messages from FwdWaitSet’’(G) because any message 
in FwdWaitSet’’(G) that has a -G instability also has D and G instabilities. On 
the other hand FwdWaitSet(G) is initially equal to FwdWaitSet(II) and therefore 
has messages equivalent to the ones in FwdWaitSet*"(G). The |protRun| procedure 
does not remove any messages from FwdWaitSet(G). Therefore at the end of the 
constellation FwdWaitSet’’(G) and FwdWaitSet(G) contain equivalent messages. 

Let msg be any message in FwdWaitSet(iD) at the critical moment. Suppose that 
its record there is (msg, index, iset). By the Self Channel Axiom iset does not 
contain any D instability and therefore it follows from the First Pre-Gritical Hy¬ 
pothesis that the record of msg in FwdWaitSet*"(H) is equivalent. By the Sec¬ 
ond Pre-Gritical Hypothesis, the record of msg at FwdWaitSet’’(G) is equivalent 
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to (msg, {/ = index./, 5 = 0},iset). The |protRemove| procedure does not change 
this record since it does not contain -G instability and so this remains the record at 
FwdWaitSet’'(G) at the end of the constellation. The record of msg at FwdWaitSet(G) 
starts out equal to the record at FwdWaitSet(£)), and then the |protRun| procedure 
zeroes out the index.6 component of the record, leaving it equivalent to the final 
value of the record at FwdWaitSet’'(G), as required by the Interim G Hypothesis. 

LaunchQueue 

non-G case: Initially LaunchQueue’’ = LaunchQueue. Neither the |protJoin| procedure 
nor the |protRemove| procedure make changes to LaunchQueue, so it remains the same, 
as required. 


G case: Initially in H process G inherits its launch queue from D, while in process 
G has an empty launch queue according to the Second Pre-Critical Hypothesis. 
The protRemovel procedure does not change the value of LaunchQueue’’ while the 
IprotRun procedure empties LaunchQueue. So LaunchQueue’’ becomes identical to 
LaunchQueue as required. 


ghost_height and flush_height 

Non-G case: Initially ghost_heighf = ghost_height and flush_heighf = flush_height. 
Neither the [protJoin] nor the [protRemove procedures make changes to either value 
(remember that we ignore the ICheckFlushl invocation at the end), so both values 
remain the same, as required. 

G case: Initially G inherits its state from D, so ghost_height{G) — ghost_height{D) 
and flush_height{G) = flush_height{D). The Second Pre-Critical Hypothesis implies 
that ghosGheighf {G) — ghosGheighf (D) and flush_heighf’{G) = ghosGheighf^D). 
Therefore initially ghosGheighf (G) = ghost_height{G). The |protRemove| procedure 
does not make changes to either variable in while the [protRun procedure resets 
the value of flush_height to be equal to ghosGheight, therefore at the end of the 
constellation: 


flush_height{G) = ghost_height{G) = ghosGheighf (G) = 

= ghosGheighf (D) = flush.heighf (G) 


as required by the Interim G Hypothesis. 


mpkt_out 

Non-G case: 


Initially mpkt_ouf = mpkGout. The protJoin procedure does not 


change this value in H, while the |protRemove| procedure increments mpkGouf .f for 
each message in FwdQueue[-G]. Since that queue is empty, the value of mpkGouf 
does not change either, and so it remains the same at the end of the constellation, 
as required by the Interim Non-G Hypothesis. 


G case: Initially mpkGouf .f = mpkGout.f by the sameness argument and it remains 
so at the end of the constellation because neither the [protRun] procedure nor the 
jprotRemovej procedure change that value (in the latter case because FwdQueue’’[-G] 
is empty). Initially mpkGout.b{G) = mpkGout.b{D) while the Second Pre-Critical 
Hypothesis implies that initially mpkGouf .b{G) = 0. The [protRemove] procedure 
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does not change that value in while the [protRun] procedure zeroes it in H. 
Therefore mpkt_ouf .b{G) becomes equal to mpkt_out.b{G). 

As a result mpkt.ouf = mpkt_out as required by the Interim G Hypothesis. 


mpktJnW 

Non-G case: We start with the coordinates that exist in H. By the First Pre-Critical 
Hypothesis, these coordinates have identical values in H and . These coordinates 
are not touched by either the protRemove procedure or the protJoin procedure, and 
so they remain identical at the end of the constellation as required by the Interim 
non-G Hypothesis. 


The mpktJn'^[-G\ coordinate is removed by the |protRemove| procedure, leaving a G 
coordinate whose value is 


mpktJn'^[G] = {/ = mpktJn^[D].f\ 6 = 0} 

according to the First Pre-Critical Hypothesis. The |protJoin| procedure creates a 
new G coordinate whose value is 

mpktJn[G] = {/ = mpktJn^[D].f-, 6 = 0} 

Since we have already seen that mpktJrr[D].f = mpktJn[D].f, the requirement of 
the Interim non-G Hypothesis is met. 

G case: Initially mpktJn^W{G) = mpktJn^W{D) according to the Second Pre-Critical 
Hypothesis. The |protRemove| procedure removes the -G coordinate in both at G 
and D and therefore at the end of the constellation the equality 

mpktJn^'W(G) — mpktJn^W{D) 


remains valid. 

On the other hand the initial relation in H is mpktJnW{G) = mpktJnW{D) because 
G is created with the same state as D. The |protJoin| procedure creates a new G 
coordinate at D with value 


mpktJn[G\{D) = {/ = mpktJn[D].f{D);b = 0} 


The protRun procedure creates a new G coordinate at G with value 


mpktJn[G]{G) = {/ = mpkt.out.f{G); 6 = 0} 


The Self Channel Axiom and LemmaElguarantee that at the start of the constellation 
mpktJn[D]{D) = mpkt_out{D). Since G starts out with the same state as D we also 
have mpkt_outf{G) = mpkt_out.f{D). Therefore mpktJn[G]{G) = mpktJn[G]{D) 
and the vector as a whole remains identical at G and D. 

From the Non-G case we know that mpktJifW{D) = mpktJn\\{D) at the end of 
the constellation. It follows that mpktJrr[]{G) = mp/ct_/n[](G) at the end of the 
constellation as required by the Interim G Hypothesis. 
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ghost[] and flush[] 

Non-G case: Initially, the First Pre-Critical Hypothesis implies that 


ghosf[] = ghost[] (J {[G] = ghost[D], [-G] = ghost[D]} 
flush''W = flush[] (J {[G] = ghost[D], [-G] = ghost[D]} 


The protRemovel procedure removes the -G coordinate from both vectors in H''. 
The IprotJoin] procedure creates a G coordinate in both vectors in H and sets both 
of them to the value of ghost[D]. As a result ghosfl] and flush'’W become equal to 
ghost[] and ffus/?[] as required by the Interim non-G Hypothesis. 


G case: Initially ghostW{G) = ghostW{D) and flushW{G) = flushW{D). By the Second 
Pre-Critical Hypothesis g/70sf’[](G) = ghosG[](£>) and flush'W{G) = flush'W{D). 
Using the First Inductive Hypothesis relation between the H and H' vectors in D 
we conclude that 


ghosf[]{G) = ghost[]{G) IJ {[G] = ghost[D]{G), [-G] = ghost[D]{G)} 
flush'1]{G) = flush[]{G) (J {[G] = ghost[D]{G), [-G] = ghost[D]{G)} 

The [protRemovel procedure removes the -G coordinate from both vectors in H'. In 
G, the |protRun| procedure creates a G coordinate in both vectors and sets both of 
them to ghost_helght{G). Since G starts out with the same state as D, we have 
ghost_helght{G) = ghost_helght{D). It follows from the Self Channel Axiom and 
Lemma [ini that at the start of the constellation ghost_helght{D) = ghost[D]{D). 
Therefore at the end of the constellation 


ghost'[G]{G) — ghost[D]{G) = flush'[G]{G) 
ghost[G]{G) = ghost_height{G) = flush[G]{G) 
ghost[D]{G) = ghost[D]{D) = ghost.height{D) = ghost.height{G) 

and therefore ghosf[]{G) = ghost[]{G) and flush'W{G) = flushW{G) as required by 
the Interim G Hypothesis. 


6.6.2 Message broadcast request constellations 

• Type of Constellation: A message broadcast request dequeuing event at a process P. We 
assume (see l2.d.lll) that APP at a process P does not issue such a request before its Main() 
function is executed in a separate thread. 

H Transactions: An execution of the [protBroadca^ procedure at P. 

Observed behavior in H'-. An equivalent transaction at P. 

Execution of CBCAST in H': In H', process P executes the |protBroadcast| procedure, just 
as it does in H. In the pre-critical case P ^ ±G. Also notice that A'“ = A'^ lsee l5.4.f]) 
and therefore ±G do not generate message broadcast requests pre-critically in PI'. 
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Since the message is not yet stamped we have an actual equality msg’' = msg and not 
just an equivalence. 

We start by verifying that in the pre-critical case the Second Pre-Critical Hypothesis is 
preserved when P = D. Processes ±G do not participate in the constellation in the pre- 
critical case. It is easy to observe that the |protBroadcast| call does not change any state 
variables other than LaunchQueue, BcastWaitSet and mpkt.out.b. All of these variables 
are stipulated by the Second Pre-Critical Hypothesis to be equal to zero at ±G and so 
we are done with this case. 

To verify the other inductive hypotheses and verify the side effects, we follow the execu¬ 
tion step by step. 

In the first step, we check if v_gap is zero. If it is not, we append the message to 
LaunchQueue. By the relevant inductive hypothesis (either First Pre-Critical, Interim 
non-G, Interim G or Gonvergent Hypothesis) v^ap'' = v_gap and LaunchQueue’' = 
LaunchQueue. Since msg’ = msg this step is executed identically in H and iJ’ and 
preserves the hypotheses. 

The next and final step is executed if v_gap = 0. Notice that during the interim pe¬ 
riod v_gap > 0 and therefore for this step only the First Pre-Gritical and Gonvergent 
Hypotheses are relevant. This step contains the following sub-steps: 

— The counter mpkGout.b is incremented. mpkt_ouf .b = mpkt_out.b and therefore the 
inductive hypotheses are preserved. 

— The next few lines stamp the message. Here the inductive hypotheses requires that 
the stamping be equivalent, rather than equal, in H and iJ’. Since self — self and 
cur.vievf — cur.vlew under all the hypotheses these parts of the stamp end up being 
equal as required. 

As for the message vector time, it is computed from the process vector time, and 
is adjusted at the self coordinate. The Convergent Hypothesis is in force when 
cur.view > Vcrit fsee l6.4|l and it stipulates that all the variables in sight are equal in 
H and FT’, resulting in i/F’)] = vt!\\. This matches the equivalence requirement (see 
Definition ITIh . 

The First Pre-Critical Hypothesis is in force when cur.vlew < Vcrit and it stipulates 
that 

vf[] = vt[]\J{[G]=0,[-G] = 0} 

The se/fcoordinate of vf is adjusted by mpkfout.b — mpktJn[self\.b. If P yf G then 
this adjustment is the same in H and iF’ and does not affect the G coordinate. But 
we already know that in the pre-critical case P ^ G, so the hypothesis is preserved 
in this case. 

— In the next step we queue the message multicast to all the members of ContactSet. 
Therefore we have an equivalent side effect to the one in FI as observed and Lemma 
[24l guarantees that the computed target set ContactSet’ is equal to the observed 
target set. 
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In the next and last steps, a record for the message is added to BcastWaitSet. In the 
pre-critical case, the message is equivalent and 


index*^ = mpkt_ouf = mpkt.out — index 



mpktJn'^[X].f — mpktJn[X].f = iset[X]./ 
mpktJrr[±G].f = mpktJn[D].f = iset[Z?]./ 


mpktJn^[X].b — mpktJn[X].b = iset[X].6 
mpktJrf[±G].b — 0 


if X ^ ±G 
if X = ±G 


if X ^ ±G 
if X = ±G 


as required by the First Pre-Gritical Hypothesis. 


6.6.3 Message and acknowledgment packet constellations 

• Type of Constellation: An acknowledgement packet, sent by a process Q in response to 
an original broadcast of a message m, is received at the originating process P. Note 
that neither G nor -G broadcast original messages during the pre-critical period or the 
interim period, therefore we can assume P ^ ±G in these cases. 

H Transactions: A single transaction, with an execution of the [Receive AckI procedure at P. 

Observed behavior in iJ’': Post-critically, or when Q ^ D, there is a single transaction 
at P with an equivalent trigger and equivalent side effects (see Definition . Pre- 
critically when Q = D, there are three transactions at P, triggered, in that order, by 
an acknowledgement packet from -G, G and D. The first two have no side effects. The 
third one has side effects equivalent to the original transaction in H. 

Execution of CBCAST in H^: We divide the analysis into several cases: 


— The constellation is post-critical. 

— The constellation is pre-critical, with P,Q ^ D 

— The constellation is pre-critical with Q ^ D and P = D 

— The constellation is pre-critical, with Q = D and P ^ D 

— The constellation is pre-critical with P = Q = D 

In the first three cases, where the constellation is post-critical or Q ^ D, the execution 
is a single invocation of [Receive AckI bv P. Only in the third of these cases (when P = D) 
is there anything to prove about Second Pre-Gritical Hypothesis. We will ignore that 
part for the moment and return to it later. We will deal with the remaining two cases 
(where Q = D) later as well. 

The Convergent case is not entirely trivial here because the trigger is equivalent but 
perhaps not equal (Pack(”^’') — Pack{^))- f^ct the trigger is equal but showing that 
requires the History Equivalence Theorem so we will not rely on this fact and use a 
weaker argument instead. 
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To verify the preservation of the First Pre-Critical Hypothesis, Interim non-G Hypothe¬ 
sis, Interim G Hypothesis or Convergent Hypothesis, as the case may be, we go step by 
step over the procedure. 

It starts with a check for the existence of a record for m in WaitSet. The record is 
guaranteed to be there but we have not proved that. However it does follow from the 
First Pre-Critical and from the two Interim Hypotheses that the record is there in H if 
and only if it is there in iJ’’. 

The Convergent case is the tricky one here. We know that to’" = m. If m is pre-critical 
then it follows from the Convergent Hypothesis that the record is not found in either H 
or . If m is post-critical then m = mJ" and the record is found in H if and only if it is 
found in . Therefore the same decision is made in both histories regardless of which 
hypothesis is in force, and in the Convergent case if the record is found then the H and 
triggers are equal. 

If the record is not found then the procedure call exits and we are done. With the check 
successful, the call continues through the following steps: 

— The Q entry is removed from the instability set of the message. In the pre-critical 
case Q ^ ±G , and the Q entry exists in both H and as we have shown. Therefore 
removing it in both histories preserves the inductive hypothesis. 

— If the instability set becomes empty, the record is removed from WaitSet and the 
ICheckFlushl procedure is called. In the post-critical case the instability vector is 
identical and therefore either empty in both histories or non-empty in both. In the 
pre-critical case, even though the instability sets are different, they are still empty 
or non-empty together. Removing the record from WaitSet preserves the inductive 
hypothesis and by Corollary [U so does invoking ICheckFlushl Moreover the same 
corollary guarantees that the side effects produced by ICheckFlushl are equal, and 
therefore equivalent, to the observed side effects. Lemma [M] guarantees that the 
target set produced by the execution of ICheckFlushl for each side effect, namely 
ContactSeP, is equal to the observed target set. 

To verify the Second Pre-Critical Hypothesis in the third case, notice that since the 
message is original, any changes in its record in WaitSet only affect the BcastWaitSet 
portion of WaitSet. Since the inductive hypothesis only requires that BcastWaitSet be 
empty in ±C, it remains valid until the invocation of lCheckFlushl In this case we cannot 
rely on Corollary O because ±C do not participate in the constellation and therefore do 
not execute ICheckFlushl Instead we rely on Lemma [H 

If FwdWaitSet’'(ZI) ^ 0 when D calls ICheckFlushl the procedure exists immediately 
and we are done. If FwdWaitSet’'(I?) = 0 then as we have seen this implies that 
the set was empty since the beginning of the transaction. This in turn implies that 
ghosGheighf (D) = cur_viev\/' (D) + v_gap'’{D). This follows from Lemma ISHS]) in the 
case v_gap^{D) > 0. If v_gap^{D) — 0 then Lemma [8l[5|) implies that fluslf[D\{D) = 
cur_viev/ [D) -F v_gap^{D) and the equation follows from Lemma [51[5]) . 

Since ghosGheighf (D) — cur_viev/ {D) + v_gap''{D) the ICheckFlushl procedure does not 
produce a ghost side effect and as a result does not change the value of ghost_heighf (D). 
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Therefore the Second Pre-Critical Hypothesis remains valid. 

We are left with the two pre-critical cases where Q = D. 

We start by verifying the preservation of the First Pre-Critical Hypothesis. The execution 
in consists of three executions of IR.eceive AckI at P, triggered by the receipt of a packet 
from -G, G and D respectively. 

The first two executions remove the -G and G entries from the instability set, but do not 
empty out the set because it still possesses a D entry. Since ICheckFlushl is not executed 
this guarantees that the two executions have no side effects, as expected. However 
the state at P is now in violation of the First Pre-Critical Hypothesis because m has 
instability at D without matching instabilities at ±G. The third execution restores the 
hypothesis by removing D from the instability set. The remaining part of the third 
execution preserves the inductive hypothesis and produces the expected side effects for 
the exact same reasons as the previous cases that we already analyzed. 

With regard to the preservation of the Second Pre-critical Hypothesis, it is only relevant 
in the Hfth case where P = D. Just as in the third case, the changes in WaitSet do 
not affect the validity of the inductive hypothesis, and it continues to be valid until 
the moment that D calls ICheckFlushl Just as before. Lemma [S] guarantees that this 
invocation does not produce a ghost broadcast and therefore preserves the Second Pre- 
Critical Hypothesis. 

• Type of Constellation: An acknowledgement packet, sent by a process Q in response to 
the forwarding of a message m, is received at the forwarding process P, in H. 

H Transactions: A single transaction, with an execution of the IR.eceive AckI procedure at P. 

Observed behavior in P[^: There are a number of separate cases: 

— Post-critically, or when P,Q ^ D, there is a single transaction at P with the same 
trigger and side effects. 

— Pre-critically when P = D and Q ^ D, there are three transactions at D, G and 
-G respectively, all triggered by acknowledgement packets from Q for the same for¬ 
warded message. The observed side effects at D are equal to the original side effects 
in P[. At ±G the side effects depend on the D side effects in the following way. 
If there is a ghost broadcast out of D then there is a same height ghost broadcast 
followed by a a same height flush broadcast out of ±G. If there is no ghost broadcast 
out of U, then there are no side effects at ±G. 

— Pre-critically when P ^ D and Q = D, there are three transactions at P, triggered, 
in that order, by an acknowledgement packet from -G, G and D for the same for¬ 
warded message. The first two have no side effects. The third one has the same side 
effect as the original transaction in PI. 

— Pre-critically when P — Q = D, there are nine transactions in total, three each at 
D, G and -G, triggered at each process, in that order, by acknowledgement packets 
from -G, G and D, all for the same forwarded message. The first two transactions in 
each process have no side effects. The third transaction at D has the side effects as 
the original transaction at D. The third transaction at each of ±G has side effects 
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that depend on the original side effects at D in the same way as in the second case: 
If there is a ghost broadcast out of D then there is a same height ghost broadcast 
followed by a a same height flush broadcast out of ±G. If there is no ghost broadcast 
out of D, then there are no side effects for the third transaction at ±G. 

Execution of CBCAST in H^: As before we divide the analysis into the different cases: 

— The constellation is post-critical. 

— The constellation is pre-critical, with P,Q ^ D 

— The constellation is pre-critical with Q ^ D and P = D 

— The constellation is pre-critical, with Q = D and P ^ D 

— The constellation is pre-critical with P = Q = D 

The analysis of cases one, two and four is identical to the analysis of similar cases in the 
case of an acknowledgement for an original message, and we will not repeat it here. 

In the third case there are three executions of the lReceiveAckl orocedure. one execution at 
each of D, G and -G. The analysis of the First Pre-Gritical Hypothesis proceeds exactly 
like the analysis of this case for an original message acknowledgement. As for the Second 
Pre-Gritical Hypothesis, the record for the message in WaitSet is in the forwarding part 
FwdWaitSet, which contains the same messages in ±G as it does in D. Moreover, for each 
message it contains the same instability set. As a result the execution of the lReceiveAckI 
proceeds in the same way in all three of these processes and the Second Pre-Gritical 
Hypothesis is preserved. By GorollaryEl the invocation of ICheckFlushl produces side 
effects that match the observed side effects and Lemma M guarantees that these side 
effects have the observed target set. 

In the fifth case there are three executions of the IReceiveAckI procedure at each process, 
triggered by the receipt of an acknowledgement packet from -G, G and Z?, in this order. 
The analysis of the First Pre-Critical Hypothesis proceeds exactly like the analysis of 
this case for an original message acknowledgement. As for the Second Pre-Critical Hy¬ 
pothesis, the record for the message m in WaitSet is in the forwarding part FwdWaitSet, 
which contains the same messages in ±G as it does in D. Moreover, for each message 
it contains the same instability set. As a result, all three executions of the IReceiveAckI 
proceed in the same way in all three of these processes. By Corollary [51 the invocation 
of ICheckFlushl preserves the Second Pre-Gritical Hypothesis and produces side effects 
that match the observed side effects and Lemma |2l] guarantees that these side effects 
have the observed target set. Therefore overall the Second Pre-Gritical Hypothesis is 
preserved and the side effects in ±G match the observed side effects. 

• Type of Constellation: An original message packet containing message m is received at 
process P in H. 

H Transactions: A single transaction, with an execution of the [ReceiveMessagej procedure 
at P. 

Observed behavior in iJ’': There are two cases 
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— Post-critically, or when P ^ D, there is a single transaction at P with the same 
trigger and side effect as the original transaction in H. 

— Pre-critically, when P = D, there are three transactions, one each at D, G and -G, 
with equivalent triggers and side effects as the original transaction in PI. 

Execution of CBCAST in H'": 


— In the post-critical and P ^ D cases, process P invokes the [ReceiveMessa^ proce- 
dure. The execution proceeds through the following steps: 


First an acknowledgment is sent back to the sender. This ensures that the side effect 
in is the expected one. 

The next step determines whether the message is original. Since the value of ORiG(m) 
is equal in both histories, the determination resolves the same way in both histories, 
and since the message is original in H it is deemed to be original in H'' as well. As 
a result mpktJn[oKiG{rn)\.h is incremented. 

In the pre-critical case ORiG(m) 7^ ±G and therefore incrementing the counter pre¬ 
serves the First Pre-Critical Hypothesis. In the post-critical case, the mpktJn[] vector 
is identical at H and , thus preserving all the relevant hypotheses (Interim G, 
Interim non-G and Convergent Hypotheses). The Second Pre-Critical Hypothesis is 
automatically preserved because neither D nor ±G participate in the constellation 
in this case. 


The next few steps check for duplicates. These checks rely on values, including 
cur_view, ReceiveSet, the ORIG(to) coordinate in vt and VT(m), that are all guaran¬ 
teed by all the relevant inductive hypotheses to be identical or equivalent at H and 
. Therefore, the duplicate check yields the same results in both histories. 

At this point a comment is due about the Convergent case. The trigger in this case 
is equivalent (Pmsg(’^’^) — Pmsg(^)) but not necessarily equal. Unlike the case of an 
acknowledgement packet, a non-equal trigger may actually occur in the Convergent 
period. For that to happen it must be that viEw(m) < Vcut (see Definition 1221). 
Since P is convergent, cur.view > Vcrit • This means that the message is obsolete. 
Since labeled step [2] of the procedure discards obsolete messages and exits, there is 
no contamination of the state with non-equal values and the inductive hypothesis 
holds. 


If the message is not a duplicate, the next and last steps are: 

* The message m is added to ReceiveSet. Since both the message m and ReceiveSet 
are equivalent in both histories, this step preserves the equivalence of ReceiveSet 
and the inductive hypothesis. 

* The message m is appended to the tail of the sender’s forwarding queue. In this 
case as well FwdQueue[sender] is equivalent in H and , and the step preserves 
the equivalence and the relevant inductive hypothesis. 

* The IScanI procedure is invoked. We want to use Lemma [25] to prove that this 
call preserves all the inductive hypotheses and has no side effects. But the 
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lemma requires that the Apply Message up-call be identical at P in i? and . 
According to 15.4.101 this is the case for P ^ G. This covers the pre-critical 
case since P, as an H process, cannot be equal to G. In the post-critical case 
the definition of ApplyMessage’’@G shows that it behaves, post-critically, like 
ApplyMessage@G, and we are done. 

In the pre-critical case when P = D, all three processes D, G and -G invoke the 
ReceiveMessage procedure. The verification of the First Pre-Gritical Hypothesis 
proceeds in this case exactly as it does in the P ^ D case. As for the Second 
Pre-Gritical Hypothesis, we retrace the execution steps: 

First an acknowledgment is sent back to the sender. This ensures that the side 
effects at G and -G are the expected ones. 

The next step determines whether the message is original. Since the value of ORiG(m) 
is equal in all three processes, the determination resolves the same way in all three, 
and since the message is original at D it is deemed to be original at ±G as well. As 
a result mpktJn[oRiG{m)].b is incremented at all three processes. 

Since the mpktJn[] vector is identical at all three processes, incrementing the counter 
at the sender coordinate preserves the inductive hypothesis. 

The next few steps check for duplicates. These checks rely on values, including 
cur.view, ReceiveSet, vt and vt(to), that are all guaranteed by the Second Pre- 
Gritical Hypothesis to be identical at all three processes. Therefore, the duplicate 
check yields the same results in all three. 

If the message is not a duplicate, the next and last steps are: 

* The message m is added to ReceiveSet. Since both the message m and ReceiveSet 
are identical in all three processes, this step preserves the Second Pre-Gritical 
Hypothesis. 

* The message m is appended to the tail of the sender’s forwarding queue. In this 
case as well FwdQueue[] is identical in all three processes and the Hypothesis is 
preserved. 

* The IScanI procedure is invoked. The ApplyMessage’'@G up-call behaves, pre- 
critically, like ApplyMessage@D, which in turn behaves like ApplyMessage’’@D 
Isee I5.4.inil . Therefore ApplyMessage behaves the same way in D and in ±G 
during the pre-critical period so it follows from Lemma[26]that this call preserves 
all the inductive hypotheses and has no side effects. 

• Type of Constellation: A message packet containing a forwarded message m is received at 
process P from process Q in H. 


H Transactions: A single transaction, with an execution of the |ReceiveMessage| procedure 
at P. 

Observed behavior in There are several cases: 

— Post-critically, or when P,Q ^ D, there is a single transaction at P with an equiv¬ 
alent trigger and side effect as the original transaction in H. 
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— Pre-critically, when P = D and Q ^ D, there are three transactions, one each at D, 
G and -G, with triggers and side effects equivalent to the original transaction in H. 

— Pre-critically, when P ^ D and Q = D, there are three transactions at P, triggered 
by the receipt of a message packet from _D, -G, and G, in that order, each with the 
side effect of an acknowledgement packet being sent to the respective sender. Each 
transaction is equivalent to the original one in H. 

— Pre-critically, when P = Q = D, there are nine transactions, three each at D, G and 
-G, triggered by the receipt of a message packet from D, -G and G, in that order, 
each with the side effect of an acknowledgement packet being send to the respective 
sender. Each transaction is equivalent to the original one in H. 

Execution of CBCAST in H^: 

— The first two cases proceed just like the equivalent cases of an original message 
packet, except that the message counter is incremented at the ./ coordinate rather 
than the .b coordinate. 

— In the pre-critical case where P ^ D and Q = D, the processes D and ±G do not 
participate in the constellation and therefore the Second Pre-Gritical Hypothesis is 
automatically preserved. The process P executes the |ReceiveMessage| three times. 
First it executes the call for the message that it receives from D, and then for the 
same message being received from -G and G. The first execution proceeds in parallel 
with the original execution in history H. The analysis of this first execution proceeds 
in almost the same way as the analysis of the receipt of an original message does. 
There is an important difference, however. When mpktJn[D].f is incremented the 
First Pre-Gritical Hypothesis is violated because mpktJn[±G\.f is not incremented 
at the same time. This violation does not, however, affect the rest of the analysis, 
and remains an isolated violation. This includes the invocation of the lScaiil procedure 
since Lemma [55] does not require this particular part of the inductive hypothesis. 

The subsequent two executions proceed through the following steps: 

First, an acknowledgement packet is sent to the respective sender, generating the 
expected side effect. 

The next step increments the forwarded message counter {mpktJn[-G].f in the second 
execution and mpktJn[G].f in the third execution). These steps remove the violation 
of the First Pre-Gritical Hypothesis. 

The next steps check for duplicates. These steps cause the call to exit in both 
executions because the message is obviously a duplicate - it was already placed in 
ReceiveSet and possibly even delivered by the first call. 

— In the pre-critical case when P = Q = D, each of the processes D, G and -G 
execute the [ReceiveMessagej three times. First for the message that each receives 
from D and then for the same message that each receives from -G and then from 
G. The analysis of the state at D in H and iL’’ proceeds exactly as in the previous 
case, where P ^ D and Q = D, proving that the First Pre-Gritical Hypothesis is 
preserved in this case. All we have to show is that the three executions at G and 
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at -G result in the preservation of the Second Pre-Critical Hypothesis. This part of 
the analysis is just a recap of previous cases. The analysis of the first execution at 
all three processes (where the message is received from D) proceeds exactly like the 
same part of the analysis of the case of an original message that is received at D. 
The next two executions proceed identically at D and at G and -G, resulting in a 
determination that the message is a duplicate. 


6.6.4 Ghost and flush packet constellations 

• Type of Constellation: A ghost packet of height v is received at a process P from a process 
Q in H. 

H Transactions: A single transaction, with an execution of the IReceiveGhostl procedure at 
process P and no side effects. 

Observed behavior in H^: There are several cases: 

— Post-critically, or when P,Q ^ D, there is a single transaction at P with the same 
trigger and no side effects. 

— Pre-critically, when P = D and Q ^ D, there are three transactions, one each at D, 
G and -G, each with the same trigger and no side effects. 

— Pre-critically, when P ^ D and Q = D, there are five transactions at P, triggered by 
the receipt of identical ghost packets of the same height from -G and then G, followed 
by flush packets of the same height from -G and then G and finally a ghost packet 
of the same height from D. This order is induced from the adjustment coordinate 
of the labels of the triggers. None of these transactions have any side effects. 

— Pre-critically, when P = Q = D, there are fifteen transactions, five each at D, G 
and -G, triggered at each process by the receipt of two ghost packets of the same 
height from -G and then G, followed by flush packets of the same height from -G 
and then G and finally a ghost packet of the same height from D. None of these 
transactions have any side effects. 

Execution of CBCAST in iJ’’: 

— In the post-critical case P executes the IReceiveGhostl procedure which results in 
raising ghost[Q] to v in both H and This preserves the relevant one among the 
Interim non-G, Interim G and Convergent Hypotheses, because all of them stipulate 
that the ghost[] vector is equal in H and PI''. The procedure does not generate any 
side effects, which matches the observed lack of side effects in H". 

— In the pre-critical case where P,Q ^ D, process P executes the IReceiveGhos^ nro- 
cedure which results in raising ghost[Q] to v in both H and H''. This preserves 
the Second Pre-Critical Hypothesis because P ^ D and so no changes occur to the 
states of either D, G or -G. To see that the First Pre-Critical Inductive Hypothesis 
is preserved observe that Q ^ ±G because Q exists in H and ±G do not, and since 
Q ^ D hy assumption, neither g/ 70 st[±G] nor ghost[D] is affected. The procedure 
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does not generate any side effects, which matches the observed lack of side effects in 
H^. 


— In the pre-critical case when P = D and Q ^ D, the three processes D, G and -G 
each execute the IR.eceiveGhostl procedure, which raises the ghost[Q] level in each 
process to v. In this case the Second Pre-Critical Hypothesis is preserved because 
this hypothesis stipulates that ghost[] is equal in all three processes, and they all 
perform the same change. The First Pre-Critical Hypothesis is preserved for the 
same reason as in the previous case. The procedure does not generate any side 
effects, which matches the observed lack of side effects in H''. 

— In the pre-critical case when P ^ D and Q = D, the process P executes the 
IReceiveGhostI procedure twice, once for each ghost packet trigger from -G and G, 
then the IReceiveFlushl procedure twice, once for each flush packet trigger from -G 
and G, and finally the IReceiveGhostI procedure once more for the ghost packet 
trigger from D. Since D and ±G do not participate in the constellation, the Second 
Pre-Gritical Hypothesis is automatically preserved. At P, the five calls raise the 
values of g/70st[-G], ghost[G], flush[-G], flush[G] and ghost[D] to v. This conforms 
with the First Pre-Critical Hypothesis. In addition, the calls to IReceiveFlushl cause 
|TryToInstall| to be invoked. 

By the First Pre-Critical Hypothesis the value of v^gap is the same at P in H and 
at the start of the constellation. 


If v_gap = 0 at the start of the constellation then by Lemma [51|I0I) LaunchQueue = 0 
at P. The first block of |TryToInstall| does nothing, and the execution of the call 
skips to the second and last block, where the messages in LaunchQueue are broadcast. 
Since LaunchQueue is empty the first execution of |TryToInstall has no side effects and 
does not change the process state. The second execution of TryToInstall] encounters 
the same values of v_gap and LaunchQueue as the first execution, and therefore it 
also does nothing. As a result the whole constellation preserves both pre-inductive 
hypotheses and produces the observed lack of side effects in . 


The case where v_gap > 0 is more subtle. First, it follows from the Piggyback Axiom 
in H that v < cur_view+ v^gap. 

By Lemma [TUI at the start of the constellation at P in H, ghost[D] < v 

Furthermore it follows from Lemma 1511511 that at the start of the constellation we 
have at P, in H: 


flush[D] < ghost[D] < V < cur_view + v^gap 


The First Pre-Critical Hypothesis insures that both inequalities true in as well. 

The constellation does not change the value of flush[D], because it does not include 
the receipt of a flush packet from D. Therefore the test loop at labeled step [T] of 
|TryToInstall| fails and the call exits without changing the process state and without 
any side effects. As a result the same happens in the second call to TryToInstall 
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and the constellation as a whole preserves the First Pre-Critical Hypothesis and 
conforms with the observed behavior of . 

— In the pre-critical case when P = D and Q = D, the process D executes the 
IR.eceiveGhostI procedure twice, once for each ghost packet trigger from -G and G, 
then the IReceiveFlushl procedure twice, once for each flush packet trigger from -G 
and G, and finally the lReceiveGhostl orocedure once more for the ghost packet trigger 
from D. The exact same thing happens at G and -G, for a grand total of fifteen 
function calls. 


The exact same analysis as in the previous case proves that the First Pre-Critical 
Hypothesis is preserved with respect to D, and that the resulting side effects in D 
conform with the observed behavior in (namely, no side effects). 

As for the execution at ±G in the Second Pre-critical Hypothesis insures that 
±G have the same values of ghost[], flush[] and v^gap as D. Therefore the execution 
of the constellation in ±G proceeds in the exact same way as in D, with the same 
changes to the state and the same lack of side effects. 

• Type of Constellation: A flush packet of height v is received at a process P from a process 
Q in H. 

H Transactions: A single transaction, with an execution of the IReceiveFlushl procedure at 
process P. 

Observed behavior in H^: There are two cases: 

— Post-critically, or when P ^ D, there is a single transaction at P with the same 
trigger and equivalent side effects. 

— Pre-critically, when P = D, there are three transactions at D, G and -G, each 
triggered by an identical flush packet from Q, and with the side effects at D being 
equivalent to the ones of the original transaction in H, and with the transactions in 
G and -G having no side effects. 

Execution of CBCAST in H^: 

— Post-critically, or when P ^ D, the process P executes the IReceiveFlushl procedure 
in HG 


The first step in this call increments the value of flush[Q]. Post-critically, this step 
obviously preserves the Interim G, Interim non-G and Convergent Hypotheses, where 
relevant. Pre-critically, the Second Pre-Critical Hypothesis is preserved because 
P ^ D and so no changes occur at either D, G or -G. To see why the First Pre- 
Critical Hypothesis is preserved, notice first that in the pre-critical case Q ^ ±G 
because Q exists in H and ±G do not, which is why the flush[zLG] coordinates are 
not affected in . Notice also that the value of ghost[D] is not affected in H. Taken 
together, these two observations explain why the hypothesis is preserved by the first 
step in this case. 


The next and last step the procedure invokes the TryToInstall procedure. By Lemma 
EiH the [Try Tolnstalll call preserves the inductive hypothesis in this case and generates 
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6.6.5 


equivalent side effects. It follows from Lemma [23] that the side effects have the 
observed target set. 

— In the pre-critical case when P = D, the analysis of the First Pre-Critical Hypothesis 
is identical to the analysis of the previous case. For the Second Pre-Critical Hypoth¬ 
esis, there are three transactions in iJ’", at D, G and -G, each one executing the 
IRccciveFlushl procedure. This call increments the value of flush[Q], which preserves 
the inductive hypothesis since flushW is identical in all three processes this case. 
In the second and last step the invocation of [TryToInstallj preserves the inductive 
hypothesis and generates the expected side effects according to Lemma [5111 


Donation and co-donation packet constellations 


Type of Constellation: A non-critical donation packet is received at a process J from a 
process P in H. 

H Transactions: A single transaction, with an execution of the IReceiveDonationI procedure 
at process J. 

Observed behavior in H^: A single transaction at J that is equivalent to the original 
H transaction. 


Execution of CBCAST in The process J executes the lReceiveDonationl nrocedure in 

Notice that since the donation is not critical then by definition J ^ G. Also by definition, 
donation packets are only sent post-critically. Therefore the only two possible operative 
inductive hypotheses are the Interim non-G Hypothesis or the Convergent Hypothesis. 

The IR eceivePonationl call starts with J adding P to ContactSet. According to either the 
Interim Non-G Hypothesis or by the Convergent Hypothesis ContactSeP = ContactSet 
and so the first step preserves the inductive hypothesis. 

In the next two steps, J sends a co-donation to P in both histories. The contents of 
co_donation are made up of state variables that are equivalent in H and regardless 
of whether the Interim non-G or the Convergent Hypothesis is in force. Therefore this 
side effect is equivalent in both histories. 


The next few steps construct the ordered set UNT. It follows from the two Hypotheses 
and from the fact that donation*^ = donation that UNT*^ = UNT, since all the ingredients 
used in the construction of this set are equivalent in both histories. In addition the same 
hypotheses imply that mpktjn'^[] — mpktJnW and therefore the subsequent loop on the 
members of UNT invokes the same procedures (either [ReceiveMessagej or IReceiveAck|l in 
the same order, and with equivalent parameters. 


To see that the [ReceiveMessagej call preserves the inductive hypotheses and generates 
equivalent side effects, simply follow the reasoning earlier in this proof for the case of 
a constellation that is triggered by the post-critical receipt of an original message. To 
see that the IR.eceiveAckl call preserves the inductive hypotheses and generates equivalent 
side effects, follow the reasoning earlier of this proof for case of a constellation that is 
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triggered by the post-critical receipt of an acknowledgement in response to an original 
message. 

The last two steps of the call update ghost[P] and flush[P] at J from the donated values 
of ghospheight and flush_height, respectively. The fact that donation’’ = donation guar¬ 
antees that the latter two values are the same in H and As a result ghost[P] and 
flush[P] remain identical in H and as required by both the Interim non-G and the 
Convergent Hypotheses. 

• Type of Constellation: A non-critical co-donation packet is received at a process P from 

a process J in H. 

H Transactions: A single transaction, with an execution of the IReceiveCoDonationI proce- 
dure at process P. 

Observed behavior in iJ’’: A single iJ’’ transaction at P that is equivalent to the original 
H transaction. 

Execution of CBCAST in H^: The process P executes the IReceiveCoDonationI procedure in 
. By definition, co-donation packets are only sent post-critically. However it is possible 
that P = G. Therefore the possible operative inductive hypotheses are the Interim non-G 
Hypothesis, Interim G Hypothesis or the Convergent Hypothesis. 

The IReceiveCoDonationI call begins with constructing the ordered set UNT. It follows 
from the three Hypotheses and from the fact that co.donation’’ = co_donation that 
UNT’’ ^ UNT, since all the ingredients used in the construction of this set are equivalent 
in both histories. In addition the same hypotheses imply that mpktJn^[] = mpktJn[] and 
therefore the subsequent loop on the members of UNT invokes the same procedures (ei¬ 
ther [ReceiyeMessa^ or [ReceiyeA^ in the same order, and with equivalent parameters. 

To see that the |ReceiveMessage| call preserves the inductive hypotheses and generates 
equivalent side effects, simply follow the reasoning earlier in this proof for the case of 
a constellation that is triggered by the post-critical receipt of an original message. To 
see that the IR.eceiveAckl call preserves the inductive hypotheses and generates equivalent 
side effects, follow the reasoning earlier of this proof for case of a constellation that is 
triggered by the post-critical receipt of an acknowledgement in response to an original 
message. 

The next two steps of the call update ghost[J] and flush[J] at P from the co-donated 
values of ghosPheight and flush_height, respectively. 

The fact that co.donation’’ = co_donation guarantees that the latter two values are the 
same in H and . As a result ghost[J] and flush[J] remain identical in H and iJ’’ as 
required by the three Hypotheses. 

In the last step the |TryToInstall| procedure is invoked. By Lemma this call preserves 
the inductive hypotheses and generates equivalent side effects. It follows from Lemma 
[23] that the side effects have the correct target set, namely ContactSet’’. 

• Type of Constellation: A critical donation packet is received at G from a process P in H. 
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H Transactions: A single transaction, with an execution of the IReceiveDonationI procedure 
at process G. 

Observed behavior in H'^-. The original trigger (the receipt of a donation packet) disap¬ 
pears, since we removed that packet in H''. The queuing of the co-donation packet also 
disappears, because the co-donation packet is removed as well. What remains are the 
side effects of all the sub-transactions, and in addition there is a new trigger event for 
each untimely packet in the Pi3 channel, with the exception of the acknowledgement 
packets for original messages, since these packets do not get cloned. The new triggers 
and the existing side effects line up perfectly. By Corollary [0] there is an order preserving 
one-to-one correspondence between the sub-transactions and the untimely message pack¬ 
ets and forwarded acknowledgement packet in the Pf3 channel. We carefully labeled the 
side effects and the new triggers so that they pair up according to that correspondence 
to make complete transactions. In addition, there are untimely ghost and flush packets 
in the pi) channel whose clones give rise to additional triggers for which there are no 
corresponding sub-transactions in H. This is not a problem, these naked triggers simply 
represent transactions that do not have side effects. Our only burden is to prove that 
if H'' executes the CBCAST algorithm, then these ghost and flush triggers will indeed 
produce no side effects and preserve the inductive hypotheses. 

Execution of CBCAST in iJ’’: The critical donation packet is the first post-critical packet 
from P that G processes in p[. We carefully labeled the untimely packets in pi^ so that 
they are all processed within the critical donation constellation. Therefore in process 
G does not process any packets from P in the interval between the critical constellation 
and the P-donation constellation. As a result at the start of the constellation we have 
in P’’, according to Lemma [SJI71 [5] and [T]) 

ghost[P]{G) = g-/70st[P]‘^®^''"“ < ghost.heightp@f^^^^^ = 

= {cur.view+ v_gap)p^j,^ < Vcut 


flush[P]{G) = flush[P]^®^''"'* < flush_heightp@(i^ = 

= {cur_view+ v_gap)< Vcut 

Therefore when the P-donation constellation starts, the value of flush[P] at G is too low 
for G to have already installed view Vcrit. Therefore G is still in its interim period (see 
ESI), and therefore the operating inductive hypothesis is the Interim G Hypothesis. 

All process G does in is process all the clones of untimely packets in pi. In P, process 
G executes the IBeceiveDonatioril procedure. This means that it adds P to ContactSet, 
then queues a co-donation to P, and only then proceeds to process a similar sequence of 
packets, with the ghosts and flushes excluded. The addition of P to ContactSet does not 
violate the Interim G Hypothesis. The fact that a co-donation is queued in P but not 
in PI'' is what we expect since we explicitly removed the critical donation packets from 
PC 
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So before either history processes the first untimely packet the Interim G Hypothesis still 
holds and the side effects meet our expectations. We have to show that the Donation 
Sub-Hypothesis also holds at this point. We have already shown that the values of 
ghost[P] and flush[P] at G are no higher than the critical values of ghospheight and 
flush_height, respectively, at P. This proves that the Donation Sub-Hypothesis holds at 
that point. 

We are going to show, by induction, that up to the untimely clone the history 
looks like an execution of CBCAST and the Donation Sub-Hypothesis holds. This means 
that we assume that in process G has processed all the clones of untimely packets 
in channel pi up to, but not including the clone. We assume that in H, process 
G has executed all the sub-transactions that correspond to the same clones, excluding 
the ghost and flush clones which do not have corresponding sub-transactions. We also 
assume that at that point the Donation Sub-Hypothesis holds. Let us look now at the 
clone. What does G do with it in H and H'^1 We look at each case separately. 

— The clone is a message packet. In this case G invokes the |ReceiveMessage| 
procedure in both histories. The message can be either original or forwarded. We 
have already analyzed similar cases earlier in the proof (post-critical message receipt) 
and showed that in both cases the Interim G Hypothesis is preserved. Since the 
Donation Sub-Hypothesis only differs from the Interim G Hypothesis with respect 
to variables that are not used by the |ReceiveMessage| procedure, the same analysis 
holds without change in this case as well. 

— The clone is an acknowledgement packet. In this case G invokes the fReceiveAckI 
procedure in both histories. The packet has to be in response to a forwarded message. 
We have already analyzed a similar case earlier in the proof (post-critical receipt of 
a forwarded acknowledgement packet) and showed that the Interim G Hypothesis 
is preserved. Since the Donation Sub-Hypothesis only differs from the Interim G 
Hypothesis with respect to variables that are not used by the lReceiveAckl orocedure. 
the same analysis holds without change in this case as well. 

— The clone is a ghost packet. In this case G invokes the IReceiveGhostl procedure 
in LH, but there is no corresponding action in H. The procedure produces no side 
effects, which is what we would expect (since G does not do anything at all in H), but 
it does increase ghost[P]. This deviation is allowed by the Donation Sub-Hypothesis, 
as long as the value ghost[P] does not exceed the critical value of ghosGheight at P. 
This is guaranteed by Lemma [TOl since the packet being processed is untimely and 
therefore queued by P pre-critically. 

— The clone is a flush packet. This case is argued similarly to the ghost case as far 
as state is concerned. However a flush packet may cause side effects. Since G does 
nothing in H, there must not be any side effects in iL’'. For there to be any side 
effects in H'" the flush value must be high, i.e. it must be equal to cur_view+ v_gap. 
But this is impossible since Lemma fTOl guarantees that the flush value is at most the 
critical value of flush.height at P, and Lemma [5] guarantees that the latter value is 
lower than Vcrit, while the current value of cur_view+ v_gap at G is at least Vcrit • 

We have established that once the process G is done processing all the sub-transactions 
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(in H) and all the untimely packets (in H’’), the Donation Sub-Hypothesis still holds. 

This is the end of the constellation in H'^ . What are the values of ghost[P] and flush[P] 
at G in this history? Since all the untimely packets are now processed, it follows 
from Lemma [iQ] that these values are equal to the critical values of ghospheight and 
flush.height, respectively, at P. 

The donation transaction is not yet over in H, however. As the last step in the 
IRcceiveDonationl procedure. G updates its values for ghost[P] and flush[P] from the do¬ 
nated values of ghost_height and flush_height that it receives as a donation from P. Since 
the P donation reflects its critical state, this last step restores the Interim G Hypothesis 
and we are done. 

• Type of Constellation: A critical co-donation packet is received at a process P from G in 
H. 

H Transactions: A single transaction, with an execution of the IR eceiveCoDonationI oroce- 
dure at process P. 

Observed behavior in Like the case of a critical donation, the observed behavior of a 
critical co-donation constellation in H'' is made up of triggers and sub-transaction side 
effects that line up perfectly: 

— Side effects of the co-donation transaction in H that carry over to iL’". 

— Trigger events, made up of the processing events of cloned and zombied packets: 

= 1 = A trigger for each clone and zombie of an untimely packet in the Uf' channel. 
Original message packets and flush packets do not get cloned on this channel, 
so they have no corresponding triggers. Ghost packets get both cloned and 
zombied, so each of these packets have two corresponding triggers. 

= 1 = A trigger for each clone of a post-critical, pre-P-donation packet in the G(3 
channel. Acknowledgement packets on this channel do not get cloned, so they 
do not have a corresponding trigger. 

According to Lemma [THl if the invocation of the |TryToInstall| procedure at the end of 
the IReceiveCoDonationI procedure installs the pending views, then UNT is empty and 
there are no sub-transactions. According to Corollary [3 whatever sub-transactions do 
exist line up perfectly with the H'' triggers for message and acknowledgement packets. 
According to Lemma [2171 the side effects of |TryToInstall| line up with a final, post-critical 
cloned flush packet trigger. 

Therefore, if |TryToInstall| does not install pending views, then all the ghost and flush 
triggers produce no side effects. If |TryToInstall| does install the pending views, then 
there are no triggers other than ghost and flush triggers, and none of them produces 
any side effects except possibly the last one. Our challenge is to prove that the CBCAST 
protocol produces exactly those side effects while preserving the inductive hypothesis. 

Execution of CBCAST in iJ’’: When G receives the critical donation packet from P, it adds 
P to its ContactSet and queues the critical co-donation packet to P. Prior to receiving 
the donation, P is not in G’s ContactSet (see Lemma iM2]) l and therefore it does not 
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queue any packets to P. As a result, the co-donation packet is the first packet that P 
receives from G in PI. Therefore the value of ghost[G] and flush[G] at P remains equal 
to the original value that P sets when it executes the critical invocation of|protJom| 


ghost[G]p^Crit{G^p) = ^/os/7[G']p@crit(G^p) = Shost[D]< ghost_heightp@^^^^^^ 


Where the last inequality follows from Lemma [8H71) . We also know from Lemma [8] that 
ghosPheightp^i < Vcrit and therefore P does not have a sufficient value of flush[G] 
at the start of the constellation to allow for the installation of view Vcrit • As a result 
the operating inductive hypothesis at the onset of the constellation is the Interim non-G 
Hypothesis, and from the inequality above it follows that the First Co-Donation Sub- 
Hypothesis holds as well. 


To complete the proof for this constellation, we first use induction to show that the First 
Co-Donation Sub-Hypothesis holds, and the expected side effects occur, up to the last 
untimely clone or zombie. Then we use a second induction to deal with the post-critical 
clones and the |TryToInstall| side effects. 

We start by showing that up to the untimely clone the history iL’’ looks like an 
execution of CBCAST and the First Co-Donation Sub-Hypothesis holds. This means that 
we assume that in iJ’’, process P has processed all the clones and zombies of untimely 
packets in channel up to, but not including the clone or zombie. We assume 
that in iL, process P has executed all the sub-transactions that correspond to the same 
clones and zombies, excluding the ghost clones and flush zombies which do not have 
corresponding sub-transactions. Let us look now at the clone or zombie. What does 
P do with it in P[ and We look at each case individually. 


— The clone is a message packet. In this case P invokes the |ReceiveMessage| 
procedure in both histories. The packet has to be a forwarded message packet. We 
have already analyzed similar cases earlier in the proof (post-critical message receipt) 
and showed that in both cases the Interim Non-G Hypothesis is preserved. Since the 
First Co-Donation Sub-Hypothesis only differs from the Interim Non-G Hypothesis 
with respect to variables that are not used by the ReceiveMessage procedure, the 
same analysis holds without change in this case as well. 


— The clone is an acknowledgement packet. In this case P invokes the lReceiveAckI 
procedure in both histories. We have already analyzed a similar case earlier in the 
proof (post-critical receipt of a forwarded acknowledgement packet) and showed 
that the Interim Non-G Hypothesis is preserved. Since the First Co-Donation Sub- 
Hypothesis only differs from the Interim Non-G Hypothesis with respect to variables 
that are not used by the IReceiveAckI procedure, the same analysis holds without 
change in this case as well. 


— The clone is a ghost packet. In this case P invokes the IReceiveChostl procedure 
in PI'", but there is no parallel action in P[. The procedure produces no side effects, 
which is what we would expect (since P does not do anything at all in H), but it 
does increase ghosf[G]. This deviation is allowed by the First Co-Donation Sub- 
Hypothesis, as long as the value ghosf[G] does not exceed ghosPheightp^^^ . This 
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is guaranteed by Lemmas [TUI and 151 since the packet being processed is untimely and 
therefore queued by D pre-critically. 

— The zombie is a flush packet. This case is argued similarly to the ghost case as 
far as state is concerned with the additional fact that 

flush_heightjj@(^^^^^ < ghost_heightjj@^^^^^^ < Vcrit 

which follows Lemma [HI The inequality implies that in the value of flush[G] 
remains low and as a result there are no side effects. This is what we expect since 
P does nothing in H and therefore does not produce side effects there. 

We have established that once the process P is done processing all the pre-critical sub¬ 
transactions (in H) and all the untimely packets (in iL'"), the First Co-Donation Sub- 
Hypothesis still holds and all the side effects are as expected. At this point in iL’' the 
process P has already processed the last pre-critical ghost and flush packets from G. 
Therefore by Lemma [TUI 

ghosf[G] = ghosGheighf = ghost.heighf = ghost_heightj^@, 

'^cnt ''crit '^crit 

flusP[G] = flush_heighf^ = ghosGheighf^ = ghost_heightjj@^^ 

Where the two rightmost equalities in each line follow from the Second Pre-Critical and 
First Pre-Critical Hypotheses, respectively. We also know that 

ghosLhe/ghf< ghosGheighf a^cntiP^G) = gt^ost_heightG@crit{p^G) 

To see why the left inequality is true, notice that Crit(P —>■ G) -< Crit(G —>■ P) (this is 
mediated by the critical co-donation packet) and therefore we can assume by induction 
that G executes CBCAST up to constellation Crit(P —)■ G). Now the inequality follows 
from the fact that ghosGheight is monotone increasing and ^vcrit Crit(P —>■ G). The 
right equality follows from the Interim G Hypothesis. 

Taken together these relations prove that the Second Co-Donation Sub-Hypothesis now 
holds. 

We divide the rest of the analysis into two parts. First, assume that the [TryToInstallj 
invocation at the end of the IReceiveCoDonationI procedure fails to install the pending 
views. 

We will show that up to the post-critical clone the history looks like an execution 
of CBCAST and the Second Co-Donation Sub-Hypothesis holds. This means that we 
assume that in process P has processed all the clones of post-critical packets in 
channel G(3 up to, but not including the clone. We assume that in H, process P 
has executed all the sub-transactions that correspond to the same clones, excluding the 
ghost and flush which do not have corresponding sub-transactions. Let us look now at 
the clone. What does P do with it in p[ and We look at each case individually. 

— The clone is a message packet. In this case P invokes the [ReceiveMessagej 
procedure in both histories. We have already analyzed similar cases earlier in the 
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proof (post-critical message receipt) and showed that in both cases the Interim Non- 
G Hypothesis is preserved. Since the Second Co-Donation Sub-Hypothesis only 
differs from the Interim Non-G Hypothesis with respect to variables that are not 
used by the |ReceiveMessage| procedure, the same analysis holds without change in 
this case as well. 

— The clone is an acknowledgement packet. This case does not occur since 
post-critical acknowledgement packets do not get cloned. 

— The clone is a ghost packet. In this case P invokes the IReceiveGhostl procedure 
in iJ’’, but there is no parallel action in H. The procedure produces no side effects, 
which is what we would expect (since P does not do anything at all in H), but it 
does increment ghosf [G]. This deviation is allowed by the Second Co-Donation Sub- 
Hypothesis, as long as the value ghosf [G] does not exceed ghost_heightQ@Q^-^^(^p^Qy 
This is guaranteed by Lemma [IHl since the packet being processed is queued by G 
before Crit(P —>• G). 

— The clone is a flush packe10. This case is argued similarly to the ghost case as 
far as state is concerned with the additional fact that 

flush.heightG@cnt{p^G) < ghosf heightG@cnt(p^G) 

which follows from Lemma [gp . However a flush packet may cause side effects. 
Since P does nothing in H, there must not be any side effects in . 

Suppose that the flush packet does have side effects. For this to happen, the 
|TryToInstall| invocation in IReceiveFlushl must install the pending views. To do that 
it is required that for every Q £ LiveSet’' we have flush'^[Q] = cur^viev/ -|- v_gap^. 
Since the Second Co-Donation Sub-Hypothesis holds, we know that 

LiveSet’’ = LiveSet 
cur_viev/ = cur.view 
v.gap" = v.gap 

flush'^lQ] = flush[Q] for all Q £ LiveSet \ {G} 

Therefore in H, for any live Q ^ G we have flush[Q] = cur_view+ v_gap 

Since the flush packet had side effects in H'" we know that the packet was Pplush i'^) 
with V = cur_view -|- v.gap. Since the original packet was queued by G before it 
processed the donation from P it follows that 

V < flush_heightG@CTit{p^G) 

In H, right before calling [TryToInstall| in IReceiveCoDonationl process P updates its 
value of flush[G], setting it to 

flush[G]{P) = condonation. flush_height = flushnheightG@crit{p^G) — 

>v = cur_view+ v_gap 

^Untimely flushes in gP are zombies, but post-critical flushes are clones 
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It follows that the invocation of |TryToInstall| in the IReceiveCoDonationI procedure 
does succeed in installing the pending views in H, contrary to our assumptions. 

We have established that once the process P is done processing all the post-critical sub¬ 
transactions (in H) and all the post-critical packets (in H^), the Second Co-Donation 
Sub-Hypothesis still holds and all the side effects are as expected. 

At this point in H process P proceeds to update its values of ghost[G] and flush[G]: 

ghost[G] = co-dionat\on.ghost.height = ghost.heightQ@Q^^^(^p^Q-^ 
flush[G] = condonation, flush.height ~ flushnheightQ@Qj.^^.(^p^Q^ 

In , P has processed all the post-critical clones, including the last of the post-critical 
ghost and flush packets from G. It follows from Lemma [TU] that at this point 

ghosf[G] = ghost_heighta@crit{p^G) 
flush^lG] = flush.heighta@CTit{p^G) 

Therefore ghosf[G] = ghost[G] and flusP[G] = flush[G] and the Interim Non-G Hypoth¬ 
esis is restored. The last step in H is an invocation of the |TryToInstall| procedure, that 
by assumption fails to install the pending views. Therefore it produces no side effects 
and does not change any state variables. 

We are finally left with the case where lTtyToInst^ does succeed in installing the pending 
views. It follows from Lemma ITK] that in this case UNT is empty and no sub-transactions 
are executed in H. In addition Lemma [20] shows that the last regular packet queued by 
G prior to processing the donation from P is fciast = Pplush('^) with v = cuCnViewd- V-gap. 

It follows that the only post-critical clones in are ghost and flush packets, and their 
processing, up to k, causes only allowable state changes and generates no side effects 
(this is shown in exactly the same way as in the previous case, where [TryToInstallj does 
not install pending views). 

So at last we are at the point where in H process P is about to update its ghost[G] 
and flush[G] values and invoke [TryToInstall[ which is going to succeed in installing the 
pending views. In H'^ process P is about to process k — Pflush(^)) f^e last cloned packet 
from G. The Second Co-Donation Sub-Hypothesis still holds. 

Since [WyToInstaHj succeeds in installing the pending views, it follows that in H, after P 
updates flush[G] we have flush[G] = cur_view+ v^gap. Therefore 

V = cuCnViewd- Vngap = cOndonat'ion. flushnheight — flush_heightQ@Q^^^.(^p_^Q-j 

By Lemma |S| we have 

cur_view+ v_gap > cOndonat'ion.ghostnheight = ghost_heightQ@Qj.^^.(^p^Q-^ > 

> flush_heightc@c^i^(^p_,c) = curnView+ v_gap 

and therefore P also updates ghost[G] = cur_view -|- v^gap. 
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In the first step in IReceiveFlushl sets flush^ [G] = z; = cur_view + v.gap and it follows 
from Lemma IMS]) that at that point we already have 

ghosf[G] = V = cur_view+ v_gap 


Therefore, if we hold both H and iL’’ at the point where both histories are about to invoke 
the |TryToInstall| procedure (in the lReceiveCoDonationl and the IR eceiveFlushl procedures. 
respectively), the Interim Non-G Hypothesis is restored. 


It follows from Lemma [29] that the execution of TryToInstall in both histories generates 
the expected side effects and causes the interim period to end and the Convergent Hy¬ 
pothesis to hold. Lemma |21l ensures that the side effects have the correct target set. 
This concludes the proof of the theorem and shows that H and perform the same 
computation. 


7 Causality and Progress With The CBCAST Algorithm 

7.1 The Goals of the Analysis 

To show that the algorithm works as expected, we have to prove two things. One is the preservation 
of causality, meaning that a message is only delivered at a process after all the messages on which 
it depends have already been delivered. The second one is the guarantee of progress - that is to 
say that all messages eventually get delivered. It is well known (see [10]) that progress cannot be 
guaranteed in the presence of failures, and it follows from m that a similar limitation exists in our 
model of dynamic membership, even without failures. Therefore we can only prove somewhat less 
than a full guarantee of progress. 

7.2 The Causal Order Property and The Progress Property 

Definition 27. We say that a message m is familiar to a process P if either 

• m is delivered at P. 

• There is a sequence of processes 

Pq, Pi,..., Pn = P where n > 0 


such that 

1. For each i < n, the process Pi is the parent of process Pi+i, i.e. = 

*^join(Ci-|- 1 j Pi ) ■ 

2. n is delivered at Pq prior to the tij(Pj^)(Po)'’'^ event. 

Definition 28. We say that a history H of the CBCAST protocol has the Causal Order Property 
if messages in P[ are delivered in an order that respects their causal relationships. Technically, this 
means that if a process P originally broadcasts a message m that is eventually delivered at a process 
Q then 
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1. every message that was broadcast by P prior to broadcasting m is already familiar to Q at the 
delivery of m. 

2. every message that is familiar to P at the broadcasting m is already familiar to Q at the 
delivery of m. 

Definition 29. We say that a history H of the CBCAST protocol has the Progress Property if 
every message that is originally broadcast by a non-halting process in H eventually becomes familiar 
to every non-halting process in H. 

Theorem 8 (Causal Order Theorem). The Causal Order Property holds for every conforming 
history of the CBCAST protocol. 

Theorem 9 (Restricted Progress Theorem). The Progress Property holds for every finite-join 
conforming history of the CBCAST protocol. 


7.3 Causal Order And Progress In Reduced Histories 

As a first step in proving the two theorems, we show that if any of the two properties holds in the 
reduction of a transactional history, then it holds in the original history as well. This allows us to 
ignore any finite number of process join notifications that may occur in the course of the history. In 
our proofs we make use of the fact that H and TT^ have been labeled using a common label space 
£. This allows us to compare the timing of events that occur in the two histories. As usual we 
denote by G the first joining process in H, with D denoting its parent. We will casually refer to 
’’time” when we technically mean ’’constellation label”. The critical time is the constellation .^vcrit 
in the joint label space, when G joins in H. 

Definition 30. We say that a message m is taken up at process P if the process moves the 
message m into ReceiveSet (this takes place at labeled step\^ of the \ReceiveMessage\ procedure). We 
will see later that the same message can be received by the same process many times, but is only 
taken up once. 

We need the following basic facts: 

• By construction and H have the same message broadcast requests (A ” = A) and those 
requests are labeled the same way in both histories. This means that the processes in 
originate the same message broadcasts, at the same time, as they do in H. 

• We know from the History Equivalence Theorem (Theorem [7|) that every process other than 
G delivers the same messages, at the same time, in both histories; that the same is true for 
G post-critically; and that pre-critically process G delivers the same messages in that D 
delivers in H, at the same time. As a result G is familiar, at any post-critical time, with the 
same messages in both histories. For other processes in H this is true without restriction. 

• It follows from the proof of the History Equivalence Theorem that every process other than 
G takes up the same messages, at the same time, in both histories; that the same is true 
for G post-critically; and that pre-critically process G takes up the same messages in 
that D takes up in H, at the same time. See in particular the parts of the proof that relate 
to receiving message packets, donation packets and co-donation packets. Notice that cloned 
message packets always result in the message being discarded rather than being taken up. 

Theorem 10. Let H be a transactional history that includes at least one join notification, and let 
H'' be its reduction. If The Causal Order Property holds in then it holds in H as well. 
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Proof. Let P and Q be any processes in H and let to be a message that originates at P and is 
eventually delivered at Q. Let n be a message is either originated by P prior to originating to or is 
familiar to P when it originates to. 

If P originates n prior to to in P[, then P also originates n prior to m in iL’'. If P is familiar with 
n when it originates m in H, it is also familiar with it in H'' when it originates to. Since Q delivers 
TO in iL, it also delivers to in iL’'. Because the Causal Order Property holds in this implies that 
Q is familiar with n at the time that it delivers to in Pl^. If that time is post-critical, then Q is 
also familiar with n at the same time in iJ and we are done. If the time is pre-critical, then Q ^ G 
(because G does not exist in PI pre-critically) and therefore Q is familiar with n at the same time 
in P[ regardless. □ 

Theorem 11. Let P[ be a transactional history that includes at least one join notification, and let 
iJ’" be its reduction. If The Progress Property holds in then it holds in H as well. 

Proof. Let P and Q be two processes in H that never halt, and let to be a message that is originated 
by P. Then P and Q, as processes in are also non-halting, and P originates to in as well. 
Since the Progress Property holds in H'', Q eventually becomes familiar with to in iJ’’. If this 
happens post-critically or if Q 7 ^ G then Q also becomes familiar with to in H. If it happens pre- 
critically and Q = G then after the critical time G becomes familiar in H with all the messages that 
G was familiar with in H'^ pre-critically, since it does not halt. Therefore in all cases Q eventually 
becomes familiar with to in iJ. □ 

Transactional histories are artificial and do not arise naturally the way conforming histories do. In 
addition, the reduction of a transactional history is rarely transactional. Therefore we need the 
following lemma. 

Lemma 31. Let H be a conforming history and let tr(iL) be its transactional closure (see the Fault 
Theorem, Theorem]^. If iv[H) has the Causal Order Property then so does H. IftT{PI) has the 
Progress Property then so does H. 

Proof. We start with the Causal Order Property. Suppose that in H a message to originates at 
process P and is delivered at process Q. Suppose that a message n is either originated by P prior 
to originating to, or else is familiar to P at the time that it originates to. History tr(iL) contains all 
the events of H, and by construction an H dequeuing event is processed the same way in tr(iL) as 
it is in H. Therefore to originates at P and is delivered at Q in tr(iL) and n is originated at P or is 
familiar to P in tr(iL) prior to the origination of to. Since tr(iL) has the causal order property, n 
must be delivered at Q prior to the delivery of to, in tr(iL). By construction tr(iL) does not contain 
any new events that precede existing H events. Therefore n must be delivered at an original H 
event, which means that n is delivered in H as well. Therefore H has the Causal Order Property. 

To prove the Progress Property suppose that in H a message to originates at a non-halting process 
P and suppose that Q is also a non-halting process, not necessarily distinct from P. Then in tr(iL) 
both P and Q are non-halting and P originates to. Since tr(iL) has the Progress Property to is 
delivered at Q in tr(iL), as part of the processing of some event e. If e is an original H event then 
TO is delivered in H and we are done. Otherwise e is a vacuum event that is generated by the 
vacuum loop. But the vacuum loop is only applied to halting processes, and Q does not halt. This 
concludes the proof. □ 
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7.4 Stunting 


A join-free history is a history H where no processes join. In other words all the processes in are 
members of view zero, and all view changes are removals of existing members. The discussion so far 
implies that if only we could prove that all join-free transactional histories enjoy the Causal Order 
Property and the Progress Property, then any finite-join conforming history would enjoy these 
properties as well. We are going to prove exactly that in the next section. As far as the Restricted 
Progress Theorem is concerned, nothing more is claimed. However, the Causal Order Theorem 
claims that the Causal Order Property holds for all conforming histories, not only finite-join ones. 
We plug this gap by introducing the notion of stunting. 

Definition 31. Let H be a transactional history and let 0 < v < be any view change in H. 
Then the stunting of H at view v, denoted , is the stunted history (see Definitions^ that is 
obtained from H by taking only that part of H that precedes the view change constellation £y. We 
now define the stunting as follows: 
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All the packets and notifications have the same faulting characteristics that they inherit from H. 
Since H is transactional there are no dropped packets or notifications. 

The proof that is a conforming history is tedious but straightforward and we omit it. Moreover, 
if all the processes in start with the same internal state they had in H then becomes a 
history of the execution of the CBCAST protocol and at any point in time up to £v the internal state 
in the processes in is identical to the internal state of the same processes in H. 

Theorem 12. Assume that the Causal Order Property holds for all join-free transactional histories. 
Then it holds for all conforming histories. 

Proof. We have already shown that under the assumption the property holds for all finite-join 
conforming histories. Let H be an infinite-join conforming history and let P and Q be processes 
in H. Suppose that P originates a message m and that m is delivered at Q. Assume also that a 
message n is either originated by P prior to the origination of m or that n is familiar to P at the 
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time that it originates m. Since H is an infinite-join history, there is a view v for which the view 
change notification event occurs after Q delivers m. 

Look at the v stunting of H. Since has the same state as H up to time iv and since m is 
delivered at Q before time iv, m is also delivered at Q at the same time in . The origination 
of TO by P occurs before Q delivers to, so it also occurs before ly and therefore occurs in 

For the same reasons message n is either originated by P prior to to or is familiar to P at the 
time TO is originated, in the stunted history. Since the stunted history is a finite-join conforming 
history, the Causal Order Property holds there and therefore Q is familiar with n at the time that 
it delivers to. Therefore the same happens in H and we are done. □ 

All we have left to do is prove the Causal Order Theorem (Theorem[5]) and the Restricted Progress 
Theorem (Theorem O in the case of join-free transactional histories. For the rest of the paper we 
will consider only such histories. In particular, we will assume without further comment that every 
view notification is a removal, and we will not refer to message familiarity (see Definition 127(1 since 
it is now synonymous with message delivery. Any definitions (and most importantly, the notion of 
effective route) will only be assumed to make sense for join-free transactional histories. 

In a join-free history of CBCAST the values of LiveSet, ContactSet and MSet have the relations 

LiveSet = ContactSet 
LiveSet C MSet 

Therefore we will assume that all live processes are contacted and are members of the current view. 

7.5 The Central Lemma 

Both theorems rely heavily on the a lemma which we refer to as the Central Lemma, and which we 
will introduce after a few definitions. 

Definition 32. The installation gap of view v at process P is the gap between v and the highest 
view known to P at the time that v is installed at P. It is the value of v_gap when the view v is 
installed at process P. More precisely, it is the value of v_gap at labeled step [7] of the \ TryToInstall\ 
procedure when cur^view = v. The installation gap of a view v at process P is denoted by gap,,{P). 
Lemma 32. Let P and Q be processes and suppose that P installs view v when Q is still alive 
(Q € LiveSet). Then P must have received a Pplush('*^ + S'^P«(^)) from Q prior to installing the 
view. 

Proof. Looking at the |TryToInstall| procedure one can verify that flush[Q] = u-|-gap„(P) when view 
v is installed at P. The claim now follows from Lemma nni and from the monotonicity of flush[Q] 
(Lemma |51ini)). □ 

Definition 33. Due to forwarding, each message can be received and acknowledged multiple times 
by a process, but the message is moved into ReceiveSet and FwdQueuef] at most once (See labeled 
step [7] of \ReceiveMessage\ ). Additionally, since a forwarding queue becomes empty after forwarding 
(see labeled step\^ of the \protRemove\ procedure), a sender will only send a message once to any 
target. Therefore for any message that is received by a target there is at most one effective sender 
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- namely the sender that managed to get its message packet not just acknowledged but also appended 
to the forwarding gueue of the receiver - and only one effective packet that is sent by the effective 
sender. A message gets forwarded only as a result of a notification of the removal of the effective 
sender. As a result, for every process that receives a message, there is at most one effective route 
of retransmissions of effective packets leading from the message originator to the process. 

Lemma 33. Let n be a message and let 


ORIG(?t.) — Rq —^ -^1 —t R2 —^ R3 —^ ■ ■ ■ —^ Rk — Td where k ^ 0 

Be the effective route of n from its originator Rq to process T. Then for every 0 < i <= k the 
process Ri is a member of view viEw(n) and 

r{Ro) < r{Ri) < ■ ■ ■ < r{Rk-i) 


Proof. Messages are only originated or forwarded by members of the message view to members 
of the message view (see labeled step [2] of the |protBroadcast| procedure and labeled step |4] of 
the IprotRemove] procedure). That proves the first claim. Forwarding is triggered by the removal 
notification of the effective sender, and view change notifications are only queued to members of 
that view. That proves the second claim. □ 

Lemma 34. When a message is received by a process for the first time it is moved into ReceiveSet 
and FwdQueueff. Therefore for every message n that is received by a process R there is exactly one 
effective route from the originator of n to R. 


Proof. Following the flow of ReceiveMessage we see that there are three cases where a message can 
be discarded without being moved into ReceiveSet and FwdQueue[]. The second and third cases 
occur when the message has already been delivered and when the message is still in ReceiveSet. 
Neither of these cases can occur the first time the message is received. The first case occurs when 
the message is obsolete. To prove the lemma we have to show that in this case as well the message is 
not received for the first time. Suppose therefore that a process S sent a message packet Pmsg(’^) 
process R, which then discarded the message as obsolete. Either S is the originator of the message 
or it forwards n out of its FwdQueue[]. Either way there exists in this case an effective route 


G = Go —>■ Gi Gfc = S' where fc > 0 (1) 

Where G = ORiG(n) is the originator of n. Notice that k = 0 covers the case where S = G, i.e. the 
case where S is the originator of n. 

From Lemma 1551 we know that r(Go) < r{Gi) < ■ ■■ < r(Gfc-i). In addition r(Gfc-i) < r{S) 
because S forwards n (when A: > 0) as a result of the removal of the effective sender Gk-i- We also 
know that viEw(n) < r(G) because G originates n. So for all i, we know that r{Gi) > viEw(n) and 
therefore Gi G §viEW(n)- 

By assumption, message n was obsolete when received by R from S. This means that R had 
already installed view viEw(n) + 1. It follows from Lemma 15^ that process R must have received 
a Pplush(> viEw(n)) packet from every member of viEw(n) for which R had not yet received a 
removal notification. In other words, before R can install viEw(n) + 1 it must have received either 
a Pplush(> viEw(n)) packet or a n^Eyi{P) notification from/about every member P G viEw(n). 
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This is true in particular for the processes in the effective route above, all of whom are members of 
viEw(n). 

We know that R does not receive a nREM{<5') notification before installing view viEw(n) + 1. This 
is because it receives the message packet Pmsg(^) from S after installing the view, and by the 
Conforming Packet Axiom this can only happen if R still considers S to be alive. Let i be the index 
of the first process Gi in the effective route for which R does not receive a n-s^EmiGi) notihcation 
prior to installing view viEw(n) + 1. 

If j = 0, then Gi is the originator of n. Since R does not receive a nREM(Go) notihcation, it must 
receive a Pplush(> viEw(n)) packet from Gq. But Gq only sends such a packet when cur_view + 
V-gap > viEw(n). Being the originator of n, process Gq broadcasts n while cur_view = viEw(n) and 
V-gap — 0 (messages are not broadcast when v^gap > 0). This means that R receives n from Go 
before receiving the flush packet and therefore before installing view viEw(n) + 1, so the obsolete 
Pmsg(^) packet from S is not the hrst receipt of n. 

If i > 0, then R receives nREM(Gi— i) prior to installing view view(7t.) + 1. When that happens, 
cur_view+v.gap — r(Gi_i) (See labeled step[2]of |protRemoveD . It also implies that r{R) > r(Gi_i). 
Afterwards, R does not proceed to install viEw(n) + 1 before receiving PpLusH(i^ i"{Gi-i)) from all 
the surviving members of viEw(n). 

Now look at process Gi. it forwards its Pmsg(^) when it receives nREM(Gi_i). It then places n in 
its WaitSet, together with an instability set that includes R (since r{R) > r(Gi_i)). It then waits 
for WaitSet to clear before sending its hrst Pplush(^ r{Gi-i)). We know that this hush packet is 
eventually sent by Gi, by dehnition of i, so n does indeed leave WaitSet. We also know that the 
hush packet is sent while Gi still considers R to be alive, according to the Process Liveness Axiom. 
Therefore n must leave WaitSet as a result of Gi receiving a Pack{^) packet from R. This means 
that R receives n from Gi before installing view viEw(n) + 1, so the obsolete Pmsg(^) packet from 
S is not the hrst receipt of n in this case either. □ 

Lemma 35 (Central Lemma). If process P installs view Vm with gap^^(P) = 0, then any message 
n with view(n) < Vm that is received by P is also received by all other processes that install view 

Vm ■ 

We will prove the central lemma a little later on. First, we use it to prove the Causal Order Theorem 
and the Restricted Progress Theorem. 

Proof of the Causal Order Theorem. We start with a note on message origination and broadcasting. 
The act of originating a message requires two separate steps whenever v_gap > 0. First, placing the 
message on LaunchQueue (see labeled step[T]of the [protBroadca^ procedure), and then later when 
V-gap becomes zero, queuing packets containing the message (see labeled step [S] of the |TryToInstall| 
procedure). In the context of this proof, we are talking about the second step whenever we talk 
about broadcasting a message. This dehnition of the term ’’broadcast” can create phantom causal 
relationships between messages. Specihcally, if message n is delivered after message m is placed 
on LaunchQueue but before a packet containing m is queued, we consider m to be dependent on 
n, though they are obviously independent. This does not invalidate our arguments, of course, 
but it does imply that the algorithm serializes message delivery to a greater degree than seems 
necessary. This suboptimal behavior is inherent in the protocol, because every view change is a 
global synchronization point that generates excess serialization. 
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Let P and Q be processes, and let m be a message that is broadcast by P and is delivered at Q. 
We have to demonstrate parts [1] and [5] of Definition [23 Remember that message familiarity is now 
synonymous with message delivery. 

We start by observing that whenever a process P broadcasts a message n, the message is delivered 
at P itself before it processes any further notifications from GMS. This follows from the Self Channel 
Axiom. This axiom only implies that n is received by P prior to processing any further notifications 
from GMS. However, when n is received by P it is immediately deliverable. This is easy to verity 
when you notice that P stamps n with its own vector time before broadcasting it. Moreover, the 
|ReceiveMessage[ ) procedure includes a call to IScanC . resulting in an immediate delivery of n. As 
a result, if P broadcasts n before broadcasting m, and if viEw(n) < viEw(m), then n is familiar 
to P when it broadcasts m, and we can lump this case under part |2] of definition 1281 This leaves 
the case viEw(n) = viEw(m). In this case we have VT(n) < VT(m) because n gets stamped by P 
before m does. 

When P broadcasts m, it has view 


cur_view = viEw(m) 
v.gap = 0 


and therefore gapviEw(m)(^) = 0 - 

Also, since m is delivered at Q we know that Q installs viEw(m). Therefore P and Q meet the 
requirements of the Central Lemma. We divide the messages that are broadcast by P or delivered 
at P prior to the broadcast of m into the following subsets: 

1. For each 0 < fc < viEw(m), the set 

= {n|viEw(n) = k and n is delivered at P} 


2. All the messages n with viEw(n) = viEw(m) and VT(n) < VT(m) 

Our previous observation shows that these sets cover all the messages n of both part [T] and part |2] 
of Definition |23 The messages in the second category must be delivered at Q prior to the delivery 
of m (see section 5.1 of [B])- For a given 0 < k < viEw(m), the Central Lemma implies that all the 
messages in DJlk are received by Q. 

Suppose that there is a message in fOlk that is not delivered at Q, and let n be such a message with 
a minimal vector time. Let n' be any message, not necessarily in 9Jlk , such that viEw(n') = k and 
VT(n') ^ VT(n). Because n e SOlk, n is delivered at P. Therefore n' must be delivered at P before 
n is delivered (see section 5.1 of i), and therefore n' G fOtk . By the minimality of n, we know that 
n' is delivered at Q. Look at the latest of the following points in time: 

• The point at which n enters ReceiveSet((5) (at labeled step |5| of [ReceiveMessagej ) 

• For each n' with viEw(n') = k and VT(n') ^ VT(n), the point at which n' is delivered at Q 
(labeled stenfTlof lScanl) 

• The point at which Q installs view k (labeled step HI of [TryToInstalip 
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This latest point in time has the following two properties. First, Eco^) i® called shortly after that 
point, as can be verified by tracing each of the three code locations. Second, the message n is present 
in ReceiveSet((3) at that point. To see that, notice that by dehnition n must have already entered 
ReceiveSet((5) (because of the first point in time), so we only have to show that n has not yet been 
removed. There are two places where n can be removed. The first is labeled steofTlof lScanl which 
is where n is delivered. Since all the above points in time must occur before n can be delivered, 
this case can be excluded. The second is labeled step [D of |TryToInstall[ where n is removed as an 
obsolete message. But all the points in time that we listed must occur before Q installs a view 
that is higher than k, so this case can be excluded as well. Therefore the message n is present in 
ReceiveSet((3) at the latest point in time. Moreover, n is deliverable at this point because Q has 
already installed view k and all the messages of lower vector time have been delivered. Therefore 
the imminent call to lScanf l will result in the immediate delivery of n to Q (in the case of labeled 
step [11 of IScanI it may happen even earlier, within the loop). Due to their low view, the messages 
in fOTk must be delivered at Q before the installation of viEw(m) which in turn takes place before 
the delivery of m. □ 

In order to prove the Restricted Progress Theorem, we need the following lemma. 

Lemma 36. Non-halting processes install all the views. 

Proof. The proof depends on the hniteness condition in a fundamental way. Let vl be the last 
view. By definition, the non-halting processes are exactly the members of that view. Let P be a 
non-halting process. At some point in time P dequeues a notification of the last view which by 
assumption is a notification niiEM(A) of the removal of the last halting process X. 

When P processes that notification it forwards all the messages in FwdQueue[X] in order, and 
moves them to its WaitSet. Thereafter, as long as P does not install view v^, P does not add any 
new messages to WaitSet. This is because messages are added to WaitSet only after broadcasting 
or forwarding a message. But since v_gap > 0 during the period at hand, P does not broadcast any 
messages, and since no further view change notifications occur, P does not forward any messages 
either. All the messages in WaitSet, having been sent by the non-halting process P, are eventually 
acknowledged by all the non-halting processes. 

Since P has already received notice of the removal of all the halting processes, this implies that all 
the messages in WaitSet eventually stabilize and therefore WaitSet empties. Once that happens, 
P sends a PpLusH(^i) packet to all the non-halting processes. This means that every non-halting 
process eventually receives a Pplush('Cl) packet from all the non-halting processes. Once that 
happens, all the views up to an including vl are installed immediately. □ 

Proof of the Restricted Progress Theorem. Let P and Q be two non-halting processes and suppose 
P broadcasts a message m. A join-free history can only have a finite number of view changes so 
we have to show that m is eventually delivered at Q. 

Let Vl he the last view. By Lemma [551 both P and Q install vl, and by necessity they do it with 
gap^^ (P) = gap„^ (Q) = 0. By the Central Lemma this implies that all messages that were received 
by P prior to view vl are also received by Q and vice versa. More generally, all the non-halting 
processes receive the same messages prior to installing view vl- 
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To see that the same happens with messages that are received after installing view vl , observe that 
since view vl is comprised exclusively of non-halting processes, all messages sent in view vl are 
broadcast by non-halting processes and therefore must be received by all members of vl- Therefore, 
all the non-halting processes receive the same messages. 

We will show now that non-halting processes also deliver the same messages. Let n be a minimal 
message is delivered at a non-halting process P but not delivered at a non-halting process Q. We 
know by the discussion above that n is received by Q, and therefore by Lemma [Ml n enters Q’s 
ReceiveSet. We know by the Causal Order Theorem all the messages that message n depends on are 
delivered at P and by minimality are also delivered at Q. Why would n not be delivered? Looking 
at the IScan] and |TryToInstalT| procedures, we see that n enters ReceiveSet when cur_view < viEw(n), 
and n is not delivered as long as Q does not install viEw(n). In addition n is removed from 
ReceiveSet without being delivered if it has not been delivered by the time Q installs viEw(n) -|- 1. 
From lemma ESI we know that Q installs view viEw(n). Look at the latest of the following three 
points in time: 

1. Q has received the effective packet Pmsg(^)j placed n in ReceiveSet. (Occurs at 

the conclusion of labeled step E] of |ReceiveMessage| ) 

2. Q has just delivered the last message that n depends on. (Occurs at the conclusion of labeled 
steolTlof lScann 

3. Q has just installed viEw(n). (Occurs at labeled step [H of |TryTolnstall| ) 

When that latest moment occurs, two conditions are met. First, Q has cur.view = viEw(n), because 
at least the third moment occurs when cur.view = viEw(n), and none of the other two moments 
can happen after a higher view is installed. Second, the message n is in ReceiveSet because it enters 
the set at the first moment and neither way out of the set (delivery of n or installation of a higher 
view) is available until after all three moments in time have occurred. As a result, the message n 
becomes deliverable at that moment. Looking at the relevant code locations, one can see that the 
IScanI procedure is either invoked or (in the second case) continues to be executed, resulting in the 
delivery of n. 

We are almost done. We know that all the non-halting processes deliver the same messages, but 
we have to show that they deliver all the messages that originate from non-halting processes. To 
do that, we will show that every process delivers all the messages that it originates itself. Since all 
non-halting processes deliver the same messages, this implies the desired result. 

Suppose that the non-halting process P originates a message n. Since P is non-halting it receives 
n and by Lemma EH it must be placed in P’s ReceiveSet. As we already observed in the proof of 
the Causal Order Theorem, when n is placed in ReceiveSet of its own originator it is immediately 
deliverable, because its VT(n) is derived from P’s vector time in exactly the fashion that makes it 
deliverable. Since message placement in ReceiveSet is followed by a call to lScanf l (labeled steps El 
and El of iReceiveMessageD , n is delivered at P and we are done. □ 

7.6 Proving the Central Lemma 

We start with a technical lemma that we will use repeatedly in the proof of the Central Lemma. 
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Lemma 37. Let A, B, D and F be processes (not necessarily distinct), and let n be a message, 
meeting the following criteria: 

1. Process D sends an effective Pmsg(^) packet to process F as it broadcasts or forwards n. 

2. Process B is a member o/viEw(n). 

3. r{D) < r{F). 

4- Process A receives a Pplush(/) packet from process F, with r{D) < / < r{B) 

Then process B receives n before process F gueues a Pflush(/) packet to process A and before B 
dequeues a notification of view r(F). 


Proof. Original messages are broadcast to all the members of the message view (see labeled step H] 
of the protBroadcast] procedure. When no processes join, ContactSet = LiveSet and since v^gap = 0 
here, MSet = LiveSet), and are forwarded to all the surviving members of the message view (see 
labeled step [H of |protRemove[ ). By condition [5] process i? is a member of viEw(n) and condition 
m implies that r{D) < r{B), so process D never receives a nREM(5) notification. Therefore, when 
D broadcasts or forwards message n, it queues a Pmsg(^) packet to process B. By condition [T] we 
know that D also queues a Pmsg(^) packet to F, and that this packet is effective. That means that 
when process F receives the packet, it appends n to its forwarding queue, with D as the sender 
(see labeled step [H of |ReceiveMessageD . Condition |3] implies that F receives a n^^y^{D) notification. 
We divide the rest of the proof into two cases. The easy case is when n is still in the forwarding 
queue when the notification is dequeued by F. The harder case is when n has already 

been removed from the forwarding queue. 

If n is still in the forwarding queue, then F forwards n to all the surviving members of view viEw(n), 
which include B because r{D) < r{B). It then moves n to its wait set, together with an instability 
set that includes B (see labeled step [H of |protRemove[ ) . The moment that F dequeues the removal 
notification of D is also the first point in time at which cur.view + v_gap > r{D) in F (see labeled 


step [T] of protRemovel. Therefore, the flush packet that F sends to A according to condition 0] is 
queued later (see labeled steol^of lCheckFlushp . Since flush packets are only queued when WaitSet 
is empty (see condition in ambient block of same location), the flush packet must be queued after 
message n has left the wait set, i.e. message n must have stabilized in the meantime. But process 
B was initially in the instability set of message n. Message n can stabilize with respect to process 
B either by receipt of a removal notification for process B, or by receipt of an acknowledgment of 
message n hy B. In the latter case B must send the acknowledgment to F before becoming aware 
of view r{F) and we are done. The former case is impossible because condition S] implies that the 
flush packet was sent before process F dequeued a notification of the removal of process B. 


If n is not in the forwarding queue when a nj^EyiiFt) notification is received by F, it means that 
it has already been removed. The only possible cause of removal is the installation, by F, of view 
viEw(n) + 1 (see labeled step Oof the |TryToInstall| procedure). Let 


e — viEw(n) + 1 + gapviEW(rt)+i(-^) 

F installs view viEw(n) + 1 only after receiving Pplush(^ c) from all the surviving members of view 
viEw(n) (see lemma [32]). In the case at hand process D is one of the surviving members, because 
by our assumption F receives the niiEM(L?) notification after installing the new view. This also 
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implies that e < r[D) and therefore, by condition 01 e < /, and therefore F sends the Pplush(/) 
packet to A after installing view viEw(n) + 1. 

If we are lucky, then D sends the effective Pmsg(^) packet before sending Pplush(^ e) to F. In that 
case n enters the wait set of D with process B in its instability set, and it must exit the wait set 
before the flush is sent. Since r{D) < r(i?), this can only happen if B receives and acknowledges n. 
Process B acknowledges the receipt of n before D sends a Pplush(^ e) packet to F, which is received 
by F before it installs view viEw(n) + 1, which in turn happens before F sends a Pplush(/) packet 
to A. Process B acknowledges the receipt of n before becoming aware of r{D) and r{D) < r(F), 
so the acknowledgment is sent before B is aware of the death of F and so we are done in this case. 

If we are not lucky, then D sends a Pplush(^ s) to F and only later sends an effective packet 
Pmsg(^) to F. Let 

Dq —Fi Dk = D ^ F where /c > 0 

Be the effective route of the message n from its originator Fq through F to F. Let i be the smallest 
integer such that Di sends a Pplush(^ o) packet to F before sending an effective Pmsg(^) packet 
along the effective route. Then i > 0 because the originator Fq broadcasts n while v_g-ap = 0, so it 
could not have sent a Pplush(^ e) beforehand, since e > viEw(n). Process F^ forwards n along the 
effective route after receiving a nREM{Fi_i) notification, which arrives, by the definition of i, after 
Di already sent a Pplush(^ c) packet to F. This implies that r(Fi_i) > e, which in turn implies 
(by lemma 15^ that F does not install view viEw(n) + 1 before receiving a Pplush{^ c) from Fi_i. 
By definition, Fi_i must send the effective Pmsg(^) along the effective route before sending the 
Pplush{^ e) packet. In addition, by lemma 1551 rlF,_i 1 < r(F) < r{B) so Fi_i sends message n to 
process B and when it moves n to its wait set it includes process B in the instability set of n. As a 
result, Di_i cannot send a Pplush(— s) packet to F before n stabilizes with respect to process B, 
and since r{Di_i) < r{B), this stabilization can only happen as a result of a receipt of a Pack(^) 
packet from B. So in this case as well B receives and acknowledges message n, and it must happen 
before F installs view viEw(n) + 1, which in turn happens before F sends a Pplush(/) packet to A. 
In addition B must send the acknowledgment of receipt of n before becoming aware of the death 
of Di-i and since r(Fi_i) < r(F) < r{F) this means that B sends the acknowledgment before 
becoming aware of view r{F). So we are done in this case as well. □ 

Proof of the Central Lemma. Assume that there is a message n with viEw(n) < Vm that is received 
by process P, and let Q be some process that installed view Vm- Denote = viEw(n). Let 
S = ORiG(n), the process that originally broadcast n. Then there is an effective route (see Definition 
ESI and Lemma ES]) of Pmsg(^) packets: 

F ^ i?i —>■ i ?2 —J" P 3 —i' Pfe —>■ P where k>Q ( 2 ) 

To streamline our arguments, we will occasionally refer to S as Pq and to P as Rk+i- By Lemma 
ESI all the processes in the effective route must be members of and 

r{S) < r(Pi) < • • ■ < r(Pfe) 

By assumption, P installs view(um), and gap„^(P) = 0. From Lemma KI2I we know that in order 
to install view Vm process P must first receive a PpLusH('*^m F packet from each member 

of LiveSet. It is easy to check that LiveSet = MSet whenever v^gap = 0. Therefore P must receive 


174 


^ PpLusH('*^m) packet from every member of view(r;m)- We will use that fact several times in the 
argument below. 

Rk+i = P is a member of view Vm- Let i be the smallest integer such that Ri is a member of Vm- 
We will divide the proof to two cases: i > 0 and i = 0. 

We start with the case i > 0. 

Since i > 0 there exists a process Ri-i that sends an effective packet Pmsg(^) Pi-i is 

not a member of view Vm- Since both are members of view and Ri is a member of view Vm-, 
r{Ri_i) is in the membership interval of Ri and so it must receive a njiEM(Pi-i) packet from the 
membership service. We know that Vn < Vm < r{Ri). We look at the following cases: 

Case I: r{Q) < r{Ri). 

We know that Q installs view Um, and by the assumption of the current case, Q never receives a 
removal notification for Ri. It follows from Lemmathat Q must receive a Pplush(/) packet from 
Ri where f > Vm before installing view Vm- Assigning A = B = Q, D = Ri-i and F = Ri, we can 
check that all the conditions of Lemma EZI are met. The only difficulty is with condition U) There 
we have 

r{D) = r{Ri_i) <v^< f 

Ri would not send a Pplush{/) packet to Q if it were aware of its removal, therefore / < r{Q) = 
r{B) 

Therefore process Q receives n. 

Case II: r{Q) > r{Ri). 

Since process Ri is a member of view Vm, P must receive a Pplush(/) from Ri, where f = Vm, 
before it installs view Vm- Assigning A = P, B = Q, D = Ri-i and F = Ri, we can check that all 
the conditions of Lemma 1571 are met. Again the difficulty is with condition U) There we have 

r{D) = <Vm=f 

f < r{Ri) < r{Q) = r{B) 

Therefore process Q receives n in this case as well. 

We now turn to the case z = 0. 

z = 0 means that process S, the originator of message n, is a member of view Vm- Therefore S 
must send a PpLusH(r'm) packet to P. Since Q is a member of view Vm, this flush packet must be 
sent while Q is in LiveSet(S'). 

When S broadcasts n, cur_view{S) = and v.gap{S) = 0, because message broadcasts do not 
occur when v_gap >0. As a result the broadcast occurs before S sends the PpLusH(r’m) packet to 
P, and while Q is in LiveSet(S'). This implies that S sends a PMSG(ri) packet to Q as part of the 
broadcast, and it includes Q in the instability set of n as it adds its record to WaitSet(5'). The 
message n must stabilize with respect to Q before S sends the PpLusH('^m) packet. This flush packet 
is sent before S receives a removal notification for Q. Therefore the stabilization can only occur as 
a result of the receipt of a Pack(^) packet from Q. In other words, Q must receive n. 

This concludes the proof of the case z = 0 and of the Central Lemma. □ 
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